Quick start guide Quick reference guide Initial settings ...
Data Warehousing Quick Guide
-
Upload
jacktheking2010 -
Category
Documents
-
view
232 -
download
0
Transcript of Data Warehousing Quick Guide
-
8/11/2019 Data Warehousing Quick Guide
1/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm
Data Warehousing - Quick Guide
Advertisements
Data Warehousing - OverviewThe term "Data Warehouse" was first coined by Bill Inmon in 1990. He said that D
subject Oriented, Integrated, Time-Variant and nonvolatile collection of data.Th
supporting decision making process by analyst in an organization
The operational database undergoes the per day transactions which causes the fre
the data on daily basis.But if in future the business executive wants to analyse the p
on any data such as product,supplier,or the consumer data. In this case the analyst
data available to analyse because the previous data is updated due to transactions.
The Data Warehouses provide us generalized and consolidated data in multidimens
with generalize and consolidated view of data the Data Warehouses also provide us
Processing (OLAP) tools. These tools help us in interactive and effective ana
multidimensional space. This analysis results in data generalization and data mining
The data mining functions like association,clustering ,classification, prediction can b
OLAP operations to enhance interactive mining of knowledge at multiple level of abstrdata warehouse has now become important platform for data analysis and
processing.
Understanding Data Warehouse
The Data Warehouse is that database which is kept separate from the organiza
database.
There is no frequent updation done in data warehouse.
Datawarehouse possess consolidated historical datawhich help the organiza
business.
Data warehouse helps theexecutives to organize,understand and use their dat
decision.
Data warehouse systems available which helps in integration of diversity of app
The Data warehouse system al lows analysis of consolidated historical data an
Definition
Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile colle
support management's decision making process.
HOME JAVA PHP Python Ruby Perl HTML CSS Javascript MySQL C++ UNIX MOR
K E E N I O A D T E C H D A T A B A S E
Thecustomizeable backendfor your impressions, clicks, &
Previous Page NDataWarehousing Tutorial
DWH- Home
DWH- Overview
DWH- Data Warehousing
DWH- Terminologies
DWH- Delivery Process
DWH- System Processes
DWH- Architecture
DWH- OLAP
DWH- Relational OLAP
DWH- Multidimensional OLAP
DWH - Schemas
DWH - Partitioning Strategy
DWH - Metadata Concepts
DWH - Data Marting
DWH - System Managers
DWH - Process Managers
DWH - Security
DWH - Backup
DWH - Tuning
DWH - Testing
DWH - Future Aspects
DWH - Interview Questions
DWH Useful Resources
Data Warehousing Quick
Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htmhttp://www.tutorialspoint.com/dwh/dwh_interview_questions.htmhttp://www.tutorialspoint.com/dwh/dwh_useful_resources.htmhttp://googleads.g.doubleclick.net/aclk?sa=L&ai=C7q-NJIZFU8byB6KnigasrYDYBcqAndxAgq-5gp4BwI23ARABIIW12wVQ14r6bWDLBMgBAagDAcgDwwSqBIgBT9BxyhfkEdwvZj3og7sVy2ufCGa3uEy0I3ijyKQU8nSJ8vlC5C_o4MjSQ66TRhPGjqLbmb--bGwCA8Kgp4ZpYa4l7SBKEKXLuzp74-MzOlBGp5eXyxEOsgShBWqYSYXt44yhvl5z08gxJTusWhuFhFLOz-2Fmr55A9qqTZ7n8LRcXOfXKltVTIAHyoXnHw&num=1&sig=AOD64_3hh233qcD9Yvwopsqq6QNwNxDl-A&client=ca-pub-7133395778201029&adurl=https://keen.iohttp://www.tutorialspoint.com/perl/inde.htmhttp://www.tutorialspoint.com/html/index.htmhttp://www.tutorialspoint.com/css/index.htmhttp://www.tutorialspoint.com/javascript/index.htmhttp://www.tutorialspoint.com/mysql/index.htmhttp://www.tutorialspoint.com/cplusplus/index.htmhttp://www.tutorialspoint.com/unix/index.htmhttp://www.tutorialspoint.com/more.htmhttp://www.tutorialspoint.com/index.htmhttp://www.tutorialspoint.com/index.htmhttp://www.tutorialspoint.com/dwh/dwh_quick_guide.htmhttp://www.tutorialspoint.com/dwh/dwh_quick_guide.htmhttp://www.tutorialspoint.com/dwh/dwh_data_marting.htmhttp://www.tutorialspoint.com/dwh/dwh_metadata_concepts.htmhttp://www.tutorialspoint.com/dwh/dwh_partitioning_strategy.htmhttp://www.tutorialspoint.com/dwh/dwh_multidimensional_olap.htmhttp://www.tutorialspoint.com/dwh/dwh_relational_olap.htmhttp://www.tutorialspoint.com/dwh/dwh_olap.htmhttp://www.tutorialspoint.com/dwh/dwh_architecture.htmhttp://www.tutorialspoint.com/dwh/dwh_delivery_process.htmhttp://www.tutorialspoint.com/dwh/dwh_terminologies.htmhttp://www.tutorialspoint.com/dwh/dwh_data_warehousing.htmhttp://www.tutorialspoint.com/dwh/dwh_overview.htmhttp://www.tutorialspoint.com/index.htmhttp://www.tutorialspoint.com/java/index.htmhttp://www.tutorialspoint.com/index.htmhttp://www.tutorialspoint.com/index.htmhttp://www.tutorialspoint.com/dwh/dwh_quick_guide.htmhttp://www.tutorialspoint.com/dwh/dwh_interview_questions.htmhttp://www.tutorialspoint.com/dwh/dwh_future_aspects.htmhttp://www.tutorialspoint.com/dwh/dwh_testing.htmhttp://www.tutorialspoint.com/dwh/dwh_tuning.htmhttp://www.tutorialspoint.com/dwh/dwh_backup.htmhttp://www.tutorialspoint.com/dwh/dwh_security.htmhttp://www.tutorialspoint.com/dwh/dwh_process_managers.htmhttp://www.tutorialspoint.com/dwh/dwh_system_managers.htmhttp://www.tutorialspoint.com/dwh/dwh_data_marting.htmhttp://www.tutorialspoint.com/dwh/dwh_metadata_concepts.htmhttp://www.tutorialspoint.com/dwh/dwh_partitioning_strategy.htmhttp://www.tutorialspoint.com/dwh/dwh_schemas.htmhttp://www.tutorialspoint.com/dwh/dwh_multidimensional_olap.htmhttp://www.tutorialspoint.com/dwh/dwh_relational_olap.htmhttp://www.tutorialspoint.com/dwh/dwh_olap.htmhttp://www.tutorialspoint.com/dwh/dwh_architecture.htmhttp://www.tutorialspoint.com/dwh/dwh_system_processes.htmhttp://www.tutorialspoint.com/dwh/dwh_delivery_process.htmhttp://www.tutorialspoint.com/dwh/dwh_terminologies.htmhttp://www.tutorialspoint.com/dwh/dwh_data_warehousing.htmhttp://www.tutorialspoint.com/dwh/dwh_overview.htmhttp://www.tutorialspoint.com/dwh/index.htmhttp://www.tutorialspoint.com/index.htmhttp://www.tutorialspoint.com/dwh/dwh_useful_resources.htmhttp://www.tutorialspoint.com/dwh/dwh_interview_questions.htmhttp://googleads.g.doubleclick.net/aclk?sa=L&ai=C7q-NJIZFU8byB6KnigasrYDYBcqAndxAgq-5gp4BwI23ARABIIW12wVQ14r6bWDLBMgBAagDAcgDwwSqBIgBT9BxyhfkEdwvZj3og7sVy2ufCGa3uEy0I3ijyKQU8nSJ8vlC5C_o4MjSQ66TRhPGjqLbmb--bGwCA8Kgp4ZpYa4l7SBKEKXLuzp74-MzOlBGp5eXyxEOsgShBWqYSYXt44yhvl5z08gxJTusWhuFhFLOz-2Fmr55A9qqTZ7n8LRcXOfXKltVTIAHyoXnHw&num=1&sig=AOD64_3hh233qcD9Yvwopsqq6QNwNxDl-A&client=ca-pub-7133395778201029&adurl=https://keen.iohttp://googleads.g.doubleclick.net/aclk?sa=L&ai=C7q-NJIZFU8byB6KnigasrYDYBcqAndxAgq-5gp4BwI23ARABIIW12wVQ14r6bWDLBMgBAagDAcgDwwSqBIgBT9BxyhfkEdwvZj3og7sVy2ufCGa3uEy0I3ijyKQU8nSJ8vlC5C_o4MjSQ66TRhPGjqLbmb--bGwCA8Kgp4ZpYa4l7SBKEKXLuzp74-MzOlBGp5eXyxEOsgShBWqYSYXt44yhvl5z08gxJTusWhuFhFLOz-2Fmr55A9qqTZ7n8LRcXOfXKltVTIAHyoXnHw&num=1&sig=AOD64_3hh233qcD9Yvwopsqq6QNwNxDl-A&client=ca-pub-7133395778201029&adurl=https://keen.iohttp://www.tutorialspoint.com/more.htmhttp://www.tutorialspoint.com/unix/index.htmhttp://www.tutorialspoint.com/cplusplus/index.htmhttp://www.tutorialspoint.com/mysql/index.htmhttp://www.tutorialspoint.com/javascript/index.htmhttp://www.tutorialspoint.com/css/index.htmhttp://www.tutorialspoint.com/html/index.htmhttp://www.tutorialspoint.com/perl/inde.htmhttp://www.tutorialspoint.com/ruby/index.htmhttp://www.tutorialspoint.com/python/index.htmhttp://www.tutorialspoint.com/php/index.htmhttp://www.tutorialspoint.com/java/index.htmhttp://www.tutorialspoint.com/index.htm -
8/11/2019 Data Warehousing Quick Guide
2/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 2
Why Data Warehouse Separated from Operational Data
The following are the reasons why Data Warehouse are kept separate from operation
The operational database is constructed for well known tasks and workload s
particular records, indexing etc but the data warehouse queries are often
presents the general form of data.
Operational databases supports the concurrent process ing of multiple transact
control and recovery mechanism are required for operational databases to en
and consistency of database.
Operational database query allow to read, modify operations while the OLAP
read onlyaccess of stored data.
Operational database maintain the current data on the other hand data wareho
historical data.
Data Warehouse Features
The key features of Data Warehouse such as Subject Oriented, Integrated, Nonv
Variant are are discussed below:
Subject Oriented - The Data Warehouse is Subject Oriented because i
information around a subject rather the organization's ongoing operations. Th
be product, customers, suppl iers, sales, revenue etc. The data warehouse doe
ongoing operations rather it focuses on modelling and analysis of data for deci
Integrated- Data Warehouse is constructed by integration of data from hetero
such as relational databases, flat files etc. This integration enhance the eff
data.
Time-Variant- The Data in Data Warehouse is identified with a particular time
in data warehouse provide information from historical point of view.
Non Volatile- Non volatile means that the previous data is not removed when n
to it. The data warehouse is kept separate from the operational database tchanges in operational database is not reflected in data warehouse.
Note: - Data Warehouse does not require transaction processing, recovery and co
because i t is physically stored separate from the operational database.
Data Warehouse Applications
As discussed before Data Warehouse helps the bus ines s executives in organize,
their data for decision making. Data Warehouse serves as a soul part of a plan
"closed-loop" feedback system for enterprise management. Data Warehouse is w
following fields:
financial s ervices
Banking Services
Consumer goods
Retail sectors.
Controlled manufacturing
Data Warehouse Types
Information process ing, Analytical process ing and Data Mining are the three types o
Data Warehousing Useful
Resources
Selected Reading
Developer's Best Practices
Computer Glossary
Who is Who
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htmhttp://www.tutorialspoint.com/computer_whoiswho.htmhttp://www.tutorialspoint.com/computer_glossary.htmhttp://www.tutorialspoint.com/developers_best_practices/index.htmhttp://www.tutorialspoint.com/dwh/dwh_useful_resources.htmhttp://www.tutorialspoint.com/dwh/dwh_quick_guide.htm -
8/11/2019 Data Warehousing Quick Guide
3/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 3
applications that are discussed below:
Information processing- Data Warehouse allow us to process the informatio
information can be processed by means of querying, basic statistical analysis
crosstabs, tables, charts, or graphs.
Analytical Processing - Data Warehouse supports analytical processing o
stored in it.The data can be analysed by means of basic OLAP operations,inc
dice,drill down,drill up, and pivoting.
Data Mining - Data Mining supports knowledge discovery by finding the hidassociations, constructing analytical models, performing classification and
mining results can be presented using the visualization tools.
SN Data Warehouse (OLAP) Operational Database(OLTP)
1This involves historical processing of
information.This involves day to day process i
2
OLAP systems are used by knowledge
workers such as executive, manager and
analyst.
OLTP system are used by clerk,
database profess ionals.
3 This is used to analys is the bus iness . This is used to run the bus iness
4 It focuses on Information out. It focuses on Data in.
5This is based on Star Schema, Snowflake
Schema and Fact Constellation Schema.This is based on Entity Relations
6 It focuses on Information out. This is application oriented.
7 This contains historical data. This contains current data.
8This provides summarized and
consolidated data.This provide primitive and highly
9This provide summarized and
multidimens ional view of data.
This provides detailed and flat re
data.
10 The number or users are in Hundreds. The number of users are in thou
11The number of records accessed are in
millions.The number of records accessed
12 The database size is from 100GB to TB The database size is from 100 M
13 This are highly flexible. This provide high performance.
Data Warehousing - Concepts
What is Data Warehousing?Data Warehousing is the process of constructing and using the data warehouse. The
is constructed by integrating the data from multiple heterogeneous sources. This
supports analytical reporting, structured and/or ad hoc queries and decisio
Warehousing involves data cleaning, data integration and data consolidations .
Using Data Warehouse Information
There are decision support technologies available which help to utilize the data w
technologies helps the executives to use the warehouse quickly and effectively. The
data, analyse it and take the decisions based on the information in the warehouse
gathered from the warehouse can be used in any of the following dom ains:
-
8/11/2019 Data Warehousing Quick Guide
4/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 4
Tuning production strategies- The product strategies can be well tuned by
products and managing product portfolios by comparing the sales quarterly or y
Customer Analysis - The customer analysis is done by analyzing the cu
preferences, buying time, budget cycles etc.
Operations Analysis - Data Warehousing also helps in customer relationsh
making environmental corrections.The Information also allow us to analy
operations.
Integrating Heterogeneous Databases
To integrate heterogeneous databases we have the two approaches as follows:
Query Driven Approach
Update Driven Approach
Query Driven Approach
This is the traditional approach to integrate heterogeneous databases. This appro
build wrappers and integrators on the top of multiple heterogeneous databases. The
also known as m ediators.
PROCESS OF QUERY DRIVEN APPROACH:
when the query is issued to a client side, a metadata dictionary translate t
queries appropriate for the individual heterogeneous site involved.
Now these queries are mapped and sent to the local query processor.
The results from heterogeneous s ites are integrated into a global answer set.
DISADVANTAGES
The Query Driven Approach needs complex integration and filtering processes .
This approach is very inefficient.
This approach is very expensive for frequent queries.
This approach is also very expensive for queries that requires aggregations.
Update Driven Approach
We are provided with the alternative approach to traditional approach. Today's Data W
follows update driven approach rather than the traditional approach discussed earlier
approach the information from multiple heterogeneous sources is integrated in adva
a warehouse. This information is available for direct querying and analysis .
ADVANTAGES
This approach has the following advantages:
This approach provide high performance.
The data are copied, processed, integrated, annotated, summarized and
semantic data store in advance.
Query process ing does not require interface with the process ing at local source
Data Warehouse Tools and Utilities Functions
-
8/11/2019 Data Warehousing Quick Guide
5/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 5
The following are the functions of Data Warehouse tools and Utilities:
Data Extraction - Data Extraction involves gathering the data from multipl
sources.
Data Cleaning- Data Cleaning involves finding and correcting the errors in data
Data Transformation - Data Transformation involves converting data from
warehouse format.
Data Loading - Data Loading involves sorting, summarizing, consolidating,
and building indices and partitions.
Refreshing- Refreshing involves updating from data sources to warehouse.
Note:Data Cleaning and Data Transformation are important steps in improving the q
data mining results.
Data Warehousing - Terminologies
In this article, we will discuss some of the comm only used terms in Data Warehouse.
Data Warehouse
Data warehouse is subject Oriented, Integrated, Time-Variant and nonvolatile colle
support of management's decis ion making process. Let's explore this Definition of da
Subject Oriented - The Data warehouse is subject oriented because it
information around a subject rather the organization's ongoing operations. Th
be product, customers, suppl iers, sales, revenue etc. The data warehouse doe
ongoing operations rather it focuses on modelling and analysis of data for deci
Integrated- Data Warehouse is constructed by integration of data from hetero
such as relational databases, flat files etc. This integration enhance the eff
data.
Time-Variant- The Data in Data Warehouse is identified with a particular time
in data warehouse provide information from historical point of view.
Non Volatile- Non volatile means that the previous data is not removed when n
to it. The data warehouse is kept separate from the operational database t
changes in operational database is not reflected in data warehouse.
Metadata- Metadata is s imply defined as data about data. The data that are u
other data is known as metadata. For example the index of a book serve as
contents in the book.In other words we can say that metadata is the summari
us to the detailed data.
In terms of data warehouse we can define metadata as following:
Metadata is a road map to data warehouse.
Metadata in data warehouse define the warehouse objects.
The metadata act as a directory.This directory helps the decision support sys
contents of data warehouse.
Metadata Respiratory
The Metadata Respiratory is an integral part of data warehouse system. The Met
contains the following metadata:
-
8/11/2019 Data Warehousing Quick Guide
6/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 6
Business Metadata - This metadata has the data ownership information, bu
and changing policies.
Operational Metadata-This metadata includes currency of data and data lin
data means whether data is active, archived or purged. Lineage of data mea
migrated and transformation applied on i t.
Data for mapping from operational environment to data warehouse-This m
source databases and their contents, data extraction,data partition, cleanin
rules, data refresh and purging rules .
The algorithms for summarization- This includes dimension algorithms, da
aggregation, summ arizing etc.
Data cube
Data cube help us to represent the data in multiple dimensions. The data cu
dimensions and facts. The dimensions are the entities with respect to which an en
records.
Illustration of Data cube
Suppose a company wants to keep track of sales records with help of sales data
respect to time, item, branch and location. These dimensions allow to keep track of m
at which branch the items were sold.There is a table associated with each dimens
known as dimension table. This dimension table further describes the dimensions. F
dimension table may have attributes such as item_name, item_type and item_brand.
The following table represents 2-D view of Sales Data for a company with respect
location dimens ions.
But here in this 2-D table we have records with respect to time and item only. The sa
are shown with respect to time and item dimensions according to type of item sold.
the sales data with one new dimens ion say the location dimens ion. The 3-D view of t
respect to time, item, and location is s hown in the table below:
-
8/11/2019 Data Warehousing Quick Guide
7/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 7
The above 3-D table can be represented as 3-D data cube as shown in the following f
Data mart
Data mart contains the subset of organisation-wide data. This subset of data is va
group of an organisation. in other words we can say that data mart contains only t
specific to a particular group. For example the marketing data mart may contain on
item, customers and sales. The data mart are confined to subjects.
Points to remember about data marts:
window based or Unix/Linux based servers are used to implement data
-
8/11/2019 Data Warehousing Quick Guide
8/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 8
implemented on low cost server.
The implementation cycle of data mart is measured in short period of time i.e
than months or years.
The life cycle of a data mart may be complex in long run if it's planning an
organisation-wide.
Data mart are small in size.
Data mart are customized by department.
The source of data mart is departmentally structured data warehouse.
Data mart are flexible.
Graphical Representation of data mart.
Virtual Warehouse
The view over a operational data warehouse is known as virtual warehouse. It is eas y
warehouse. Building the virtual warehouse requires excess capacity on operational d
Data Warehousing - Delivery Proces
Introduction
The data warehouse are never static. It evolves as the business increases. The tod
different from the future needs.We must design the data warehouse to change co
problem is that business itself is not aware of its requirement for information in the fuevolves it's need also changes therefore the data warehuose must be designed t
changes. Hence the data warehouse systems need to be flexible.
There should be a delivery process to deliver the data warehouse.But there are ma
warehouse projects that it is very difficult to complete the task and deliverables in t
fashion demanded by waterfall method because the requirements are hardly fully un
when the requirements are completed only then the architectures des igns, and build
be completed.
Delivery Method
The delivery method is a variant of the joint application development approach, adop
-
8/11/2019 Data Warehousing Quick Guide
9/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 9
data warehouse. We staged the data warehouse delivery process to minim ize the r
that i will discuss does not reduce the overall delivery time-scales but ensures busi
delivered incrementally through the development process.
Note: The delivery process is broken into phases to reduce the project and delivery ris
Following diagram Explain the Stages in delivery process:
IT Strategy
Data warehouse are strategic investments, that require business process to gen
benefits. IT Strategy is required to procure and retain funding for the project.
Business Case
The objective of Business case is to know the projected business benefits that shou
using the data warehouse. These benefits may not be quantifiable but the projected
be clearly stated.. If the data warehouse does not have a clear bus iness case then t
to suffer from the credibility problems at some stage during the delivery process.T
warehouse project we need to understand the busines s case for investment.
Education and PrototypingThe organization will experiment with the concept of data analysis and educate th
value of data warehouse before determining that a data warehouse is prior solution. T
by prototyping. This prototyping activity helps in understanding the feasibility and b
warehouse. The Prototyping activity on a sm all scale can further the educational proce
The prototype address a defined technical objective.
The prototype can be thrown away after the feasibility concept has been shown.
The activity addresses a smal l subset of eventual data content if the data wareh
The activity timescale is non- critical.
-
8/11/2019 Data Warehousing Quick Guide
10/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 10
Points to remember to produce an early release of a part of a data warehouse to
benefits.
Identify the architecture that is capable of evolving.
Focus on the business requirements and technical blueprint phases.
Limit the scope of the first build phase to the minimum that delivers bus iness b
Understand the short term and medium term requirements of the data warehou
Business Requirements
To provide the quality deliverables we should make sure that overall requirements are
business requirements and the technical blueprint stages are required because
reasons:
If we understand the business requirements for both short and medium te
design a solution that satis fies the short term need.
This would be capable of growing to the full solution.
Things to determine in this s tage are following.
The business rule to be applied on data.
The logical model for information within the data warehouse.
The query profiles for the immediate requirement.
The source systems that provide this data.
Technical Blueprint
This phase need to deliver an overall architecture satisfying the long term requirem
also deliver the components that must be implem ented in a short term to derive any
The blueprint need to identify the followings .
The overall system architecture.
The data retention policy.
The backup and recovery strategy.
The server and data mart architecture.
The capacity plan for hardware and infrastructure.
The components of database design.
Building the version
In this s tage the first production deliverable is produced.
This production deliverable smalles t component of data warehouse.
This s malles t component adds business benefit.
History Load
This is the phase where the remainder of the required history is loaded into the data w
phase we do not add the new entities but additional physical tables would probably b
-
8/11/2019 Data Warehousing Quick Guide
11/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 1
the increased data volumes.
Let's have an example, Suppose the build version phase has delivered a retail sa
warehouse with 2 months worth of history. This information will allow the user to
recent trends and address the short term issues . The user can not identify the ann
trends. So the 2 years worth of sales history could be loaded from the archive to mak
the sales trend yearly and seasonal. Now the 40GB data is extended to 400GB.
Note:The backup and recovery procedures may become complex therefore it is re
perform this activity within separate phase.
Ad hoc Query
In this phase we configure an ad hoc query tool.
This ad hoc query tool is used to operate the data warehouse.
These tools can generate the database query.
Note:It is recommended that not to use these access tolls when database is b
modified.
Automation
In this phas e operational management processes are fully automated. These would i
Transforming the data into a form suitable for analysis.
Monitoring query profiles and determining the appropriate aggregations to
performance.
Extracting and loading the data from different source systems.
Generating aggregations from predefined definitions within the data warehouse
Backing Up, restoring and archiving the data.
Extending Scope
In this phase the data warehouse is extended to address a new set of business re
scope can be extended in two ways:
By loading additional data into the data warehouse.
By introducing new data marts using the existing information.
Note:This phase should be performed separately since this phase involves subst
complexity.
Requirements EvolutionFrom the perspective of delivery process the requirement are always changeab
static.The delivery process must support this and allow these changes to be re
system.
This issue is addressed by designing the data warehouse around the use of data
processes, as oppos ed to the data requirements of existing queries .
The architecture is designed to change and grow to match the business needs,the
as a pseudo application development process, where the new requirements are co
the development activities. The partial deliverables are produced.These partial del
back to users and then reworked ensuring that overall system is continually upd
-
8/11/2019 Data Warehousing Quick Guide
12/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 12
business needs.
Data Warehousing - System Process
We have fixed number of operations to be applied on operational databases and we
techniques such as use normalized data,keep table small etc. These techniques
delivering a solution. But in case of decision support system we do not know what qu
need to be executed in future. Therefore techniques applied on operational database
for data warehouses.
In this chapter We'll focus on designing data warehousing solution built on the
technologies like Unix and relational databases .
Process Flow in Data Warehouse
There are four major processes that build a data warehous e. Here is the lis t of four pr
Extract and load data.
Cleaning and transforming the data.
Backup and Archive the data.
Managing queries & directing them to the appropriate data sources.
Extract and Load Process
The Data Extraction takes data from the source s ystems.
Data load takes extracted data and loads it into data warehouse.
Note:Before loading the data into data warehouse the information extracted from
mus t be reconstructed.
Points to remember while extract and load process :
Controlling the process
When to Initiate Extract
Loading the Data
CONTROLLING THE PROCESS
-
8/11/2019 Data Warehousing Quick Guide
13/66
-
8/11/2019 Data Warehousing Quick Guide
14/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 14
For example in a retail sales analysis data warehouse, it may be required to keep da
latest 6 months data being kept online. In this kind of scenario there is often requirem
do month-on-month comparisons for this year and last year. In this case we require
restored from the archive.
Query Management Process
This process performs the following functions
This process manages the queries.
This process speed up the queries execution.
This Process direct the queries to most effective data sources.
This process s hould also ensure that all system sources are used in most effe
This process is also required to monitor actual query profiles.
Information in this process is used by warehouse management process to
aggregations to generate.
This process does not generally operate during regular load of information into
Data Warehousing - ArchitectureIn this article, we will discuss the business analysis framework for data wareh
architecture of a data warehouse.
Business Analysis Framework
The business analyst get the information from the data warehouses to measure the
make critical adjustments in order to win over other business holders in the ma
warehouse has the following advantages for the busines s.
Since the data warehouse can gather the information quickly and efficiently
enhance the bus iness productivity.
The data warehouse provides us the consistent view of customers and items
manage the customer relationship.
The data warehouse also helps in bringing cost reduction by tracking trends
long period in a consistent and reliable manner.
To design an effective and efficient data warehouse we are required to understand
business needs and construct a business analysis framework. Each person ha
regarding the design of a data warehouse. These views are as follows:
The top-down view - This view allows the selection of relevant information
warehouse.
The data source view - This view presents the information being captu
managed by operational system.
The data warehouse view - This view includes the fact tables and dime
represent the information stored inside the data warehouse.
The Business Query view- It is the view of the data from the viewpoint of the en
Three-Tier Data Warehouse Architecture
Generally the data warehouses adopt the three-tier architecture. Following are the t
-
8/11/2019 Data Warehousing Quick Guide
15/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 15
warehouse architecture.
Bottom Tier- The bottom tier of the architecture is the data warehouse databa
relational database system.We use the back end tools and utilities to feed
tier.these back end tools and utilities performs the Extract, Clean, Load, and ref
Middle Tier- In the middle tier we have OLAp Server. the OLAP Server can be
either of the following ways.
By relational OLAP (ROLAP), which is an extended relational databa
system. The ROLAP maps the operations on multidimensional data to soperations.
By Multidimensional OLAP (MOLAP) model, which directly implements
data and operations.
Top-Tier- This tier is the front-end client layer. This layer hold the query tools a
analysis tools and data mining tools.
Following diagram explains the Three-tier Architecture of Data warehouse:
Data Warehouse Models
From the perspective of data warehouse architecture we have the following data ware
Virtual Warehouse
Data mart
Enterprise Warehouse
VIRTUAL WAREHOUSE
-
8/11/2019 Data Warehousing Quick Guide
16/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 16
The view over a operational data warehouse is known as virtual warehouse. It
virtual warehouse.
Building the virtual warehouse requires excess capacity on operational databas
DATA MART
Data mart contains the subset of organisation-wide data.
This subset of data is valuable to specific group of an organisation
Note:in other words we can say that data mart contains only that data which is spec
group. For example the marketing data mart may contain only data related to item
sales. The data mart are confined to subjects.
Points to remember about data marts
window based or Unix/Linux based servers are used to implement data
implemented on low cost server.
The implementation cycle of data mart is measured in short period of time i.e
than months or years.
The life cycle of a data mart may be complex in long run if it's planning an
organisation-wide.
Data mart are small in size.
Data mart are customized by department.
The source of data mart is departmentally structured data warehouse.
Data mart are flexible.
ENTERPRISE WAREHOUSE
The enterprise warehouse collects all the information all the subjects sp
organization
This provide us the enterprise-wide data integration.
This provide us the enterprise-wide data integration.
The data is integrated from operational systems and external information provid
This information can vary from a few gigabytes to hundreds of gigabytes, teraby
Load Manager
This Component performs the operations required to extract and load process .
The size and complexity of load manager varies between specific soluwarehouse to data warehouse.
LOAD MANAGER ARCHITECTURE
The load manager performs the following functions:
Extract the data from source system.
Fast Load the extracted data into temporary data store.
Perform s imple transformations into structure similar to the one in the data war
-
8/11/2019 Data Warehousing Quick Guide
17/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 17
EXTRACT DATA FROM SOURCE
The data is extracted from the operational databases or the external information provi
the application programs that are used to extract data. It is supported by underlying
client program to generate SQL to be executed at a server. Open Database Connect
Database Connection (JDBC), are examples of gateway.
FAST LOAD
In order to minimize the total load window the data need to be loaded into the
fastest poss ible time.
The transformations affects the speed of data process ing.
It is more effective to load the data into relational database prior to applying tra
checks.
Gateway technology proves to be not suitable, since they tend not be performan
volumes are involved.
SIMPLE TRANSFORMATIONS
While loading it may be required to perform s imple transformations. After this has be
are in position to do the complex checks. Suppose we are loading the EPOS sale
need to perform the following checks:
Strip out all the columns that are not required within the warehouse.
Convert all the values to required data types.
Warehouse Manager
Warehouse manager is respons ible for the warehouse management process.
The warehouse manager consis t of third party system software, C programs an
The size and complexity of warehouse manager varies between specific solutio
WAREHOUSE MANAGER ARCHITECTURE
-
8/11/2019 Data Warehousing Quick Guide
18/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 18
The warehouse manager includes the following:
The Controlling process
Stored procedures or C with SQL
Backup/Recovery tool
SQL Scripts
OPERATIONS PERFORMED BY WAREHOUSE MANAGER
Warehouse manager analyses the data to perform consistency and referential
Creates the indexes, bus iness views, partition views agains t the base data.
Generates the new aggregations and also updates the existing aggregatio
normalizations.
Warehouse manager Warehouse manager transforms and merge the sou
temporary store into the published data warehouse.
Backup the data in the data warehous e.
Warehouse Manager archives the data that has reached the end of its captured
Note: Warehouse Manager also analyses query profiles to determine index and
appropriate.
Query Manager
Query Manager is respons ible for directing the queries to the suitable tables.
By directing the queries to appropriate table the query request and response
up.
Query Manager is respons ible for scheduling the execution of the queries pose
-
8/11/2019 Data Warehousing Quick Guide
19/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 19
QUERY MANAGER ARCHITECTURE
Query Manager includes the following:
The query redirection via C tool or RDBMS.
Stored procedures.
Query Management tool.
Query Scheduling via C tool or RDBMS.
Query Schedul ing via third party Software.
Detailed information
The following diagram shows the detailed information
-
8/11/2019 Data Warehousing Quick Guide
20/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 20
The detailed information is not kept online rather is aggregated to the next level o
archived to the tape. The detailed infomation part of data warehouse keep the detai
the starflake schema. the detailed information is loaded into the data warehouse to
aggregated data.
Note:If the detailed information is held offline to minimize the disk storage we shou
the data has been extracted, cleaned up, and transformed then into starflake sch
archived.
Summary Information
In this area of data warehouse the predefined aggregations are kept.
These aggregations are generated by warehouse manager.
This area changes on ongoing basis in order to respond to the changing query
This area of data warehouse mus t be treated as transient.
Points to remem ber about summary information.
The summ ary data speed up the performance of common queries.
It increases the operational cost.
It need to be updated whenever new data is loaded into the data warehouse.
It may not have been backed up, since it can be generated fresh from the detaile
Data Warehousing - OLAP
Introduction
Online Analytical Processing Server (OLAP) is based on multidimensional data mo
managers , analysts to get insight the information through fast, consistent, inte
-
8/11/2019 Data Warehousing Quick Guide
21/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 2
information. In this chapter we will discuss about types of OLAP, operations on
between OLAP and Statistical Databases and OLTP.
Types of OLAP Servers
We have four types of OLAP servers that are lis ted below.
Relational OLAP(ROLAP)
Multidim ens ional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
Special ized SQL Servers
Relational OLAP(ROLAP)
The Relational OLAP servers are placed between relational back-end server and clie
To store and manage warehouse data the Relational OLAP use relational or ex
DBMS.
ROLAP includes the following.
implementation of aggregation navigation logic.
optimization for each DBMS back end.
additional tools and services.
Multidimensional OLAP (MOLAP)
Multidimens ional OLAP (MOLAP) uses the array-based multidimens ional stor
multidimensional views of data.With multidimensional data s tores, the storage utiliza
the data set is sparse. Therefore many MOLAP Server uses the two level of data storag
to handle dense and sparse data sets.
Hybrid OLAP (HOLAP)
The hybrid OLAP technique combination of ROLAP and MOLAP both. It has both the h
ROLAP and faster computation of MOLAP. HOLAP server allows to store the large
detail data. the aggregations are stored separated in MOLAP store.
Specialized SQL Servers
specialized SQL servers provides advanced query language and query processing
queries over star and snowflake schemas in a read-only environment.
OLAP Operations
As we know that the OLAP server is based on the multidim ens ional view of data henc
the OLAP operations in multidimens ional data.
Here is the list of OLAP operations.
Roll-up
Drill-down
Slice and dice
-
8/11/2019 Data Warehousing Quick Guide
22/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 22
Pivot (rotate)
ROLL-UP
This operation performs aggregation on a data cube in any of the following way:
By climbing up a concept hierarchy for a dimension
By dimension reduction.
Consider the following diagram showing the roll-up operation.
The roll-up operation is performed by climbing up a concept hierarchy for the di
Initially the concept hierarchy was "s treet < city < province < country".
On rolling up the data is aggregated by ascending the location hierarchy from
level of country.
The data is grouped into cities rather than countries.
When roll-up operation is performed then one or more dimensions from th
removed.
DRILL-DOWN
Drill-down operation is reverse of the roll-up. This operation is performed by either of t
-
8/11/2019 Data Warehousing Quick Guide
23/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 23
By stepping down a concept hierarchy for a dimension.
By introducing new dimens ion.
Consider the following diagram showing the drill-down operation:
The drill-down operation is performed by stepping down a concept hierarchy f
time.
Initially the concept hierarchy was "day < month < quarter < year."
On drill-up the time dimension is descended from the level quarter to the level o
When drill-down operation is performed then one or more dimensions from t
added.
It navigates the data from less detailed data to highly detailed data.
SLICE
The slice operation performs selection of one dimension on a given cube and give us
Consider the following diagram showing the slice operation.
-
8/11/2019 Data Warehousing Quick Guide
24/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 24
The Slice operation is performed for the dimension time us ing the criterion time
It will form a new sub cube by selecting one or more dimensions.
DICE
The Dice operation performs selection of two or more dimension on a given cube a
subcube. Consider the following diagram showing the dice operation:
-
8/11/2019 Data Warehousing Quick Guide
25/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 25
The dice operation on the cube based on the following selection criteria that involve th
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem").
PIVOT
The pivot operation is also known as rotation.It rotates the data axes in view in or
alternative presentation of data.Consider the following diagram showing the pivot ope
-
8/11/2019 Data Warehousing Quick Guide
26/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 26
In this the item and location axes in 2-D s lice are rotated.
OLAP vs OLTP
SN Data Warehouse (OLAP) Operational Database(OLTP)
1
This involves historical processing of
information. This involves day to day process i
2
OLAP systems are used by knowledge
workers such as executive, manager and
analyst.
OLTP system are used by clerk,
database profess ionals.
3 This is used to analys is the bus iness . This is used to run the bus iness
4 It focuses on Information out. It focuses on Data in.
5This is based on Star Schema, Snowflake
Schema and Fact Constellation Schema.This is based on Entity Relations
6 It focuses on Information out. This is application oriented.
-
8/11/2019 Data Warehousing Quick Guide
27/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 27
7 This contains historical data. This contains current data.
8This provides summarized and
consolidated data.This provide primitive and highly
9This provide summarized and
multidimens ional view of data.
This provides detailed and flat re
data.
10 The number or users are in Hundreds. The number of users are in thou
11 The number of records accessed are inmillions. The number of records accessed
12 The database size is from 100GB to TB The database size is from 100 M
13 This are highly flexible. This provide high performance.
Data Warehousing - Relational OLA
Introduction
The Relational OLAP servers are placed between relational back-end server and clie
To store and manage warehouse data the Relational OLAP use relational or ex
DBMS.
ROLAP includes the following.
implementation of aggregation navigation logic.
optimization for each DBMS back end.
additional tools and services.
Note:The ROLAP servers are highly scalable.
Points to remember
The ROLAP tools need to analyze large volume of data across multiple dimens
The ROLAP tools need to store and analyze highly volatile and changeable data
Relational OLAP Architecture
The ROLAP includes the following.
Database Server
ROLAP Server
Front end tool
-
8/11/2019 Data Warehousing Quick Guide
28/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 28
Advantages
The ROLAP servers are highly scalable.
They can be easily used with the existing RDBMS.
Data Can be s tored efficiently since no zero facts can be stored.
ROLAP tools do not use pre-calculated data cubes.
DSS server of microstrategy adopts the ROLAP approach.
Disadvantages
Poor query performance.
Some limitations of scalabil ity depending on the technology architecture that is
Data Warehousing - Multidimensional O
Introduction
Multidimens ional OLAP (MOLAP) uses the array-based multidimens ional stor
multidimensional views of data. With multidimensional data stores, the storage utiliza
the data set is sparse. Therefore many MOLAP Server uses the two level of data storag
to handle dense and sparse data sets.
Points to remember:
MOLAP tools need to process information with consistent response time rega
summarizing or calculations selected.
The MOLAP tools need to avoid many of the complexities of creating a relat
store data for analysis .
The MOLAP tools need fastest possib le performance.
MOLAP Server adopts two level of storage representation to handle dense and
Denser subcubes are identified and stored as array structure.
Sparse subcubes employs compression technology.
MOLAP Architecture
-
8/11/2019 Data Warehousing Quick Guide
29/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 29
MOLAP includes the following components.
Database server
MOLAP server
Front end tool
Advantages
Here is the list of advantages of Multidimens ional OLAP
MOLAP allows fastest indexing to the precomputed summ arized data.
Helps the user who are connected to a network and need to analyze larger, less
Easier to us e therefore MOLAP is best suitable for inexperienced user.
Disadvantages
MOLAP are not capable of containing detailed data.
The storage utilization may be low if the data set is sparse.
MOLAP vs ROLAP
SN MOLAP ROLAP
1 The inform ation retrieval is fas t. Inform ation retrieval is com parati
2It uses the sparse array to store the data
sets.It uses relational table.
3MOLAP is best suited for inexperienced
users since it is very easy to use.ROLAP is best suited for experie
4 The separate database for data cube.It may not require space other tha
Data warehouse.
5 DBMS facility is weak. DBMS facility is strong.
Data Warehousing - Schemas
Introduction
-
8/11/2019 Data Warehousing Quick Guide
30/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 30
The schema is a logical description of the entire database. The schema include
description of records of all record types including all associated data-items and agg
the database the data warehouse also require the schema. The database uses the
on the other hand the data warehouse uses the Stars, snowflake and fact constellatio
chapter we will discuss the schemas used in data warehouse.
Star Schema
In star schema each dimension is represented with only one dimension table.
This dimension table contains the set of attributes.
In the following diagram we have shown the sales data of a company with r
dimensions namely, time, item, branch and location.
There is a fact table at the centre. This fact table contains the keys to each of fou
The fact table also contain the attributes namely, dollars sold and units sold.
Note:Each dimension has only one dimension table and each table holds a set
example the location dimension table contains the
{location_key,street,city,province_or_state,country}. This constraint may cause data
example the "Vancouver" and "Victoria" both cities are both in Canadian province of
The entries for such cities may cause data redundancy along the attributes provin
country.
Snowflake Schema
In Snowflake schema some dimension tables are normalized.
The normalization split up the data into additional tables.
Unlike Star schema the dimensions table in snowflake schema are normalize
item dimension table in s tar schema is normalized and split into two dimensi
item and supplier table.
-
8/11/2019 Data Warehousing Quick Guide
31/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 3
Therefore now the item dimension table contains the attributes item_key,
brand, and s upplier-key.
The supplier key is linked to supplier dimension table. The supplier dimensio
the attributes supplier_key, and suppl ier_type.
Note: Due to normalization in Snowflake schema the redundancy is reduced therefore
to maintain and save storage space.
Fact Constellation Schema
In fact Constellation there are multiple fact tables. This schema is also
schema.
In the following diagram we have two fact tables namely, sales and shipping.
-
8/11/2019 Data Warehousing Quick Guide
32/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 32
The sale fact table is sam e as that in star schema.
The shipping fact table has the five dimensions namely, item_key, time_key, s
location.
The shipping fact table also contains two measures namely, dollars sold and u
It is also possible for dimension table to share between fact tables. For examp
location dimension tables are shared between sales and shipping fact table.
Schema DefinitionThe Multidimensional schema is defined using Data Mining Query Language(
primitives namely, cube definition and dimension definition can be used for d
warehouses and data marts.
SYNTAX FOR CUBE DEFINITION
define cube [}:
SYNTAX FOR DIMENSION DEFINITION
define dimension as(
-
8/11/2019 Data Warehousing Quick Guide
33/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 33
The Snowflake schema that we have discussed can be defined us ing the Data Mining
(DMQL) as follows :
define cube sales [time,item,branch,location]:
dollars sold =sum(sales indollars),units sold =count(*)
define dimension time as(time key,day,day of week,month,quart
define dimension item as(item key,item name,brand,type,supplidefine dimension branch as(branch key,branch name,branch type)
define dimension location as(location key,street,city,province
define cube shipping [time,item,shipper,fromlocation,to locat
dollars cost =sum(cost indollars),units shipped =count(*)
define dimension time astime incube sales
define dimension item asitem incube sales
define dimension shipper as(shipper key,shipper name,location a
location incube sales,shipper type)
define dimension fromlocation aslocation incube sales
define dimension to locationaslocation
incube sales
Data Warehousing - Partitioning Strate
Introduction
The partitioning is done to enhance the performance and make the management
also helps in balancing the various requirements of the system. It will optimi
performance and s implify the managem ent of data warehouse. In this we partition ea
multiple separate partitions. In this chapter we will discuss about the partitioning strat
Why to PartitionHere is the list of reasons.
For easy management
To assis t backup/recovery
To enhance performance
FOR EASY MANAGEMENT
The fact table in data warehouse can grow to many hundreds of gigabytes in s ize. Th
fact table is very hard to manage as a s ingle entity. Therefore it needs partition.
TO ASSIST BACKUP/RECOVERY
If we do not have partitioned the fact table then we have to load the complete fact
data.Partitioning allow us to load that data which is required on regular bas is. This w
to load and also enhances the performance of the system.
Note:To cut down on the backup size all partitions other than the current partitions ca
only. We can then put these partition into a state where they can not be modified.
backed up .This means that only the current partition is to be backed up.
TO ENHANCE PERFORMANCE
-
8/11/2019 Data Warehousing Quick Guide
34/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 34
By partitioning the fact table into sets of data the query procedures can be enha
performance is enhanced because now the query scans the partitions that are rel
have to scan the large amount of data.
Horizontal Partitioning
There are various way in which fact table can be partitioned. In horizontal partitioning w
mind the requirements for manageability of the data warehouse.
PARTITIONING BY TIME INTO EQUAL SEGMENTSIn this partitioning strategy the fact table is partitioned on the bases of time period
period represents a significant retention period within the bus iness . For example if th
month to date data then it is appropriate to partition into monthly segments. W
partitioned tables by removing the data in them.
PARTITIONING BY TIME INTO DIFFERENT-SIZED SEGMENTS
This kind of partition is done where the aged data is accessed infrequently.
implemented as a set of small partitions for relatively current data, larger partition for i
Following is the list of advantages.
The detailed information remains available online.
The number of physical tables is kept relatively small, which reduces the opera
This technique is sui table where the mix of data dipping recent history, and da
entire history is required.
Following is the lis t of disadvantages.
This technique is not useful where the partitioning profile changes on regular b
repartitioning wil l increase the operation cost of data warehouse.
PARTITION ON A DIFFERENT DIMENSION
The fact table can also be partitioned on basis of dimensions other than time
group,region,suppl ier, or any other dimens ions. Let's have an example.
Suppose a market function which is structured into distinct regional departments for
state basis. If each region wants to query on information captured within its region,
be more effective to partition the fact table into regional partitions. This will cause the
up because it does not require to scan information that is not relevant.
Following is the list of advantages.
Since the query does not have to scan the irrelevant data which speed up the qu
Following is the lis t of disadvantages.
This technique is not appropriate where the dimens ions are unlikely to change
-
8/11/2019 Data Warehousing Quick Guide
35/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 35
worth determining that the dimension does not change in future.
If the dimension changes then the entire fact table would have to be repartitione
Note:We recommend that do the partition only on the basis of time dimension unle
that the suggested dimens ion grouping will not change within the life of data warehou
PARTITION BY SIZE OF TABLE
When there are no clear basis for partitioning the fact table on any dimension then we
the fact table on the basis of their size.We can set the predetermined size as a c
the table exceeds the predetermined s ize a new table partition is created.
Following is the lis t of disadvantages.
This partitioning is complex to manage.
Note:This partitioning required metadata to identify what data stored in each partition
PARTITIONING DIMENSIONS
If the dimens ion contain the large number of entries then it is required to partition dim
have to check the size of dimens ion.
Suppose a large design which changes over time. If we need to store all the variation
comparisons, that dimension may be very large. This would definitely affect the respo
ROUND ROBIN PARTITIONS
In round robin technique when the new partition is needed the old one is archived.
metadata is used to allow us er access tool to refer to the correct table partition.
Following is the list of advantages.
This technique make it easy to automate table management facilities within the
Vertical Partition
In Vertical Partitioning the data is split vertically.
-
8/11/2019 Data Warehousing Quick Guide
36/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 36
The Vertical Partitioning can be performed in the following two ways.
Normalization
Row Splitting
NORMALIZATION
Normalization method is the standard relational method of database organization. I
rows are collapsed into single row, hence reduce the space.
Table before normalization
Product_id Quantity Value sales_date Store_id Store_name Locat
30 5 3.67 3-Aug-13 16 sunny Bang
35 4 5.33 3-Sep-13 16 sunny Bang
40 5 2.50 3-Sep-13 64 san Mumb
45 7 5.66 3-Sep-13 16 sunny Bang
Table after normalization
Store_id Store_name Location R
16 sunny Bangalore W
64 san Mumbai S
Product_id Quantity Value sales_date S
30 5 3.67 3-Aug-13 1
35 4 5.33 3-Sep-13 1
40 5 2.50 3-Sep-13 6
45 7 5.66 3-Sep-13 1
ROW SPLITTING
The row spl itting tend to leave a one-to-one map between partitions. The motive of
speed the access to large table by reducing its s ize.
Note:while using vertical partitioning make sure that there is no requirement to p
operations between two partitions.
Identify Key to Partition
It is very crucial to choose the right partition key.Choosing wrong partition key will leadthe fact table. Let's have an example. Suppose we want to partition the following table
Account_Txn_Table
transaction_id
account_id
transaction_type
value
transaction_date
region
branch_name
-
8/11/2019 Data Warehousing Quick Guide
37/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 37
We can choose to partition on any key. The two poss ible keys could be
region
transaction_date
Now suppose the business is organised in 30 geographical regions and each reg
number of branches.That will give us 30 partitions, which is reasonable. This pa
enough because our requirements capture has shown that vast majority of queries ar
user's own business region.
Now If we partition by transaction_date instead of region. Then it means that the lates
every region will be in one partition. Now the user who wants to look at data within hi
to query across multiple partition.
Hence it is worth determining the right partitioning key.
Data Warehousing - Metadata Concep
What is Metadata
Metadata is simply defined as data about data. The data that are used to represent ot
as metadata. For example the index of a book serve as metadata for the contents in
words we can say that metadata is the summarized data that leads us to the detaileddata warehouse we can define metadata as following.
Metadata is a road map to data warehouse.
Metadata in data warehouse define the warehouse objects.
The metadata act as a directory.This directory helps the decision support sys
contents of data warehouse.
Note: In data warehouse we create metadata for the data names and definitions
warehouse. Along with this metadata additional metadata are also created for t
extracted data, the s ource of extracted data.
Categories of Metadata
The metadata can be broadly categorized into three categories:
Business Metadata - This metadata has the data ownership information, bu
and changing policies.
Technical Metadata- Technical metadata includes database system names,
names and sizes, data types and allowed values. Technical metadata also in
information such as primary and foreign key attributes and indices .
Operational Metadata- This metadata includes currency of data and data lin
data means whether data is active, archived or purged. Lineage of data mea
migrated and transformation applied on i t.
-
8/11/2019 Data Warehousing Quick Guide
38/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 38
Role of Metadata
Metadata has very important role in data warehouse. The role of metadata in ware
from the warehouse data yet it has very important role. The various roles of metad
below.
The metadata act as a directory.
This directory helps the decision support system to locate the contents of data w
Metadata helps in decis ion support system for mapping of data when data are
operational environment to data warehouse environment.
Metadata helps in summ arization between current detailed data and highly sum
Metadata also helps in s ummarization between lightly detailed data and hi
data.
Metadata are also used for query tools.
Metadata are used in reporting tools.
Metadata are used in extraction and cleansing tools.
Metadata are used in transformation tools.
Metadata also plays important role in loading functions.
Diagram to understand role of Metadata.
-
8/11/2019 Data Warehousing Quick Guide
39/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 39
Metadata RespiratoryThe Metadata Respiratory is an integral part of data warehouse system. The Metadata
the following metadata:
Definition of data warehouse- This includes the description of structure of data
description is defined by schema, view, hierarchies, derived data definition
locations and contents.
Business Metadata - This metadata has the data ownership information, bu
and changing policies.
Operational Metadata- This metadata includes currency of data and data lin
data means whether data is active, archived or purged. Lineage of data meamigrated and transformation applied on i t.
Data for mapping from operational environment to data warehouse- This m
source databases and their contents, data extraction,data partition cleanin
rules, data refresh and purging rules .
The algorithms for summarization- This includes dimension algorithms, da
aggregation, summ arizing etc.
Challenges for Metadata Management
The importance of metadata can not be overstated. Metadata helps in driving the ac
validates data transformation and ensures the accuracy of calculations. The metadthe consistent definition of business terms to business end users. With all these u
also has challenges for metadata management. The some of the challenges are disc
The Metadata in a big organization is scattered across the organization. T
spreaded in spreadsheets, databases, and applications.
The metadata could present in text file or multimedia file. To use this dat
management solution, this data need to be correctly defined.
There are no industry wide accepted standards. The data management solut
narrow focus.
-
8/11/2019 Data Warehousing Quick Guide
40/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 40
There is no easy and accepted methods of passing metadata.
Data Warehousing - Data Marting
Why to create Datamart
The following are the reasons to create datamart:
To partition data in order to imposeaccess control strategies.
To speed up the queries by reducing the volume of data to be scanned.
To segment data into different hardware platforms .
To structure data in a form suitable for a user access tool.
Note:Donot data mart for any other reason since the operation cost of data marting c
Before data marting, make sure that data marting strategy is appropriate for your parti
Steps to determine that data mart appears to fit the bill
Following s teps need to be followed to make cost effective data marting:
Identify the Functional Splits
Identify User Access Tool Requirements
Identify Access Control Issues
IDENTIFY THE FUNCTIONAL SPLITS
In this step we determine that whether the natural functional split is there in the orga
for departmental splits, and we determine whether the way in which department use
to be in isolation from the rest of the organization. Let's have an example...
suppose in a retail organization where the each merchant is accountable for maximiz
group of products. For this the information that is valuable is :
sales transaction on daily basis
sales forecast on weekly basis
stock position on daily basis
stock movements on daily basis
As the merchant is not interes ted in the products they are not dealing with, so th
subset of the data dealing which the product group of interest. Following diagram sh
for different users .
-
8/11/2019 Data Warehousing Quick Guide
41/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 4
Issues in determining the functional split:
The structure of the department may change.
The products m ight switch from one department to other.
The merchant could query the sales trend of other products to analyse what is
sales.
These are issues that need to be taken into account while determining the functional
Note:we need to determine the busines s benefits and technical feasibility of using da
IDENTIFY USER ACCESS TOOL REQUIREMENTS
For the user access toolsthat require the internal data structures we need data ma
tools. The data in such s tructures are outside the control of data warehouse but nee
and updated on regular basis.
There are some tools that populated directly from the source system but some c
additional requirements outside the scope of the tool are needed to be identified for fu
Note: In order to ensure consistency of data across all access tools the data shou
populated from the data warehouse rather each tool mus t have its own data mart.
IDENTIFY ACCESS CONTROL ISSUES
There need to be privacy rules to ensure the data is accessed by the authorised
example in data warehouse for retail baking institution ensure that all the accounts be
legal entity. Privacy laws can force you to totally prevent access to information that is
specific bank.
Data mart allow us to build complete wall by physically separating data segment
warehouse. To avoid possible privacy problems the detailed data can be remove
warehouse.We can create data mart for each legal entity and load it via data wareho
account data.
-
8/11/2019 Data Warehousing Quick Guide
42/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 42
Designing Data Marts
The data marts should be designed as smaller version of starflake schema
warehouse and should match to the database design of the data warehouse
maintaining control on database instances.
The summaries are data marted in the same way as they would have been designewarehouse. Summ ary tables helps to utilize all dimension data in the starflake schem
Cost Of Data Marting
The following are the cost measures for Data marting:
Hardware and Software Cost
Network Access
Time Window Constraints
HARDWARE AND SOFTWARE COSTAlthough the data marts are created on the same hardware even then they require
hardware and software.To handle the user queries there is need of additional proce
disk s torage. If the detailed data and the data mart exist within the data warehouse th
additional cost to store and manage replicated data.
Note: The data marting is more expensive than aggregations therefore it should
additional strategy not as an alternative strategy.
NETWORK ACCESS
The data mart could be on different locations from the data warehouse so we shou
-
8/11/2019 Data Warehousing Quick Guide
43/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 43
LAN or WAN has the capacity to handle the data volumes being transferred within th
process.
TIME WINDOW CONSTRAINTS
The extent to which the data mart loading process will eat into the available time wi
on the complexity of the transformations and the data volumes being shipped. Feas
data mart depend on.
Network Capacity.
Time Window Available
Volume of data being transferred
Mechanisms being used to insert data into data mart
Data Warehousing - System Manage
Introduction
The system management is must for the successful implementation of data wa
chapter we will discuss the most important system managers such as following men
System Configuration Manager
System Scheduling Manager
System Event Manager
System Database Manager
System Backup Recovery Manager
System Configuration Manager
The system configuration manager is responsible for the management oconfiguration of data warehouse.
The Structure of configuration manager varies from the operating system to ope
In unix structure of configuration manager varies from vendor to vendor.
Configuration manager have the s ingle user interface.
The interface of configuration manager allow us to control of all as pects of the s
Note:The most important configuration tool is the I/O manager.
System Scheduling ManagerThe System Scheduling Manager is also responsible for the successful implemen
warehouse. The purpose of this scheduling manager is to schedule the ad ho
operating system has its own scheduler with some form of batch control mechan
System Scheduling Manager are following.
Work across cluster or MPP boundaries.
Deal with international time differences.
Handle job failure.
Handle multiple queries.
-
8/11/2019 Data Warehousing Quick Guide
44/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 44
Supports job priorities.
Restart or requeue the failed jobs .
Notify the user or a process when job is completed.
Maintain the job schedules across s ystem outages.
Requeue jobs to other queues.
Support the stopping and starting of queues.
Log Queued jobs.
Deal with interqueue processing.
Note:The above are the evaluation parameters for evaluation of a good scheduler.
Some important jobs that the scheduler must be able to handle are as followed:
Daily and ad hoc query scheduling.
execution of regular report requirements.
Data load
Data Processing
Index creation
Backup
Aggregation creation
data transformation
Note: If the data warehouse is running on a cluster or MPP architecture, then the s
manager mus t be capable of running across the architecture.
System Event Manager
The event manager is a kind of a software. The event manager manages the events th
the data warehouse system. We cannot manage the data warehouse manually beca
of data warehouse is very complex. Therefore we need a tool that automatically han
without intervention of the user.
Note:The Event manager monitor the events occurrences and deal with them. the ev
track the myriad of things that can go wrong on this complex data warehouse system.
EVENTS
The question arises is What is an event? event is nothing but the action that are gene
or the system itself. It may be noted that the event is measurable, observable, occuaction.
The following are the comm on events that are required to be tracked.
hardware failure.
Running out of space on certain key disks.
A process dying.
A process returning an error.
CPU usage exceeding an 805 threshold.
-
8/11/2019 Data Warehousing Quick Guide
45/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 45
Internal contention on database serialization points.
Buffer cache hit ratios exceeding or failure below threshold.
A table reaching to maximum of its size.
Excess ive mem ory swapping.
A table failing to extend due to lack of space.
Disk exhibiting I/O bottlenecks.
Usage of temporary or sort area reaching a certain thresholds .
Any other database shared memory usage.
The most important thing about is that they should be capable of executing on their
packages that defined the procedures for the predefined events. The code associate
is known as event handler. This code is executed whenever an event occurs.
System and Database Manager
System and Database manager are the two separate piece of software but they do t
objective of these tools is to automate the certain processes and to simplify the ex
The Criteria of choosing the system and database m anager are an abitlity to:
increase user's Quota.
ass ign and deassign role to the users.
ass ign and deass ign the profiles to the users.
perform database space management
monitor and report on space usage.
tidy up fragmented and unused space.
add and expand the space.
add and remove users.
manage user password.
manage summary or temporary tables.
ass ign or deass ign temporary space to and from the user.
reclaim the s pace form old or outofdate temporary tables.
manage error and trace logs.
to browse log and trace files.
redirect error or trace information.
switch on and off error and trace logging.
perform s ystem space management.
monitor and report on space usage.
clean up old and unused file directories.
add or expand space.
-
8/11/2019 Data Warehousing Quick Guide
46/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 46
System Backup Recovery Manager
The backup and recovery tool make it easy for operations and management staff to b
is worth noted that the system backup manager must be integrated with the sc
software being used. The important features that are required for the managemen
following.
Scheduling
Backup data tracking
Database awareness.
The backup are taken only to protect the data against loss. Following are the im
remember.
The backup software will keep some from of database of where and when the
backed up.
The backup recovery manager must have a good front end to that database.
The backup recovery software should be database aware.
Being aware of database the software then can be addressed in database te
perform backups that would not be viable.
Data Warehousing - Process Manage
Data Warehouse Load Manager
This Component performs the operations required to extract and load process .
The size and complexity of load manager varies between specific solu
warehouse to data warehouse.
LOAD MANAGER ARCHITECTURE
The load manager does the following functions.
Extract the data from source system.
Fast Load the extracted data into temporary data store.
Perform s imple transformations into structure similar to the one in the data war
-
8/11/2019 Data Warehousing Quick Guide
47/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 47
EXTRACT DATA FROM SOURCE
The data is extracted from the operational databases or the external information provi
the application programs that are used to extract data. It is supported by underlying
client program to generate SQL to be executed at a server. Open Database Connect
Database Connection (JDBC), are examples of gateway.
FAST LOAD
In order to minimize the total load window the data need to be loaded into the
fastest poss ible time.
The transformations affects the speed of data process ing.
It is more effective to load the data into relational database prior to applying tra
checks.
Gateway technology proves to be not suitable, since they tend not be performan
volumes are involved.
SIMPLE TRANSFORMATIONS
While loading it may be required to perform s imple transformations. After this has be
are in position to do the complex checks. Suppose we are loading the EPOS sale
need to perform the following checks.
Strip out all the columns that are not required within the warehouse.
Convert all the values to required data types.
Warehouse Manager
Warehouse manager is respons ible for the warehouse management process.
The warehouse manager consis t of third party system software, C programs an
The size and complexity of warehouse manager varies between specific solutio
WAREHOUSE MANAGER ARCHITECTURE
-
8/11/2019 Data Warehousing Quick Guide
48/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 48
The warehouse manager includes the following.
The Controlling process
Stored procedures or C with SQL
Backup/Recovery tool
SQL Scripts
OPERATIONS PERFORMED BY WAREHOUSE MANAGER
Warehouse manager analyses the data to perform consistency and referential
Creates the indexes, bus iness views, partition views agains t the base data.
Generates the new aggregations and also updates the existing aggregation
Generates the normalizations.
Warehouse manager Warehouse manager transforms and merge the sou
temporary store into the published data warehouse.
Backup the data in the data warehous e.
Warehouse Manager archives the data that has reached the end of its captured
Note: Warehouse Manager also analyses query profiles to determine index and
appropriate.
Query Manager
Query Manager is respons ible for directing the queries to the suitable tables.
By directing the queries to appropriate table the query request and response
up.
Query Manager is respons ible for scheduling the execution of the queries pose
-
8/11/2019 Data Warehousing Quick Guide
49/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 49
QUERY MANAGER ARCHITECTURE
Query Manager includes the following.
The query redirection via C tool or RDBMS.
Stored procedures.
Query Management tool.
Query Scheduling via C tool or RDBMS.
Query Schedul ing via third party Software.
OPERATIONS PERFORMED BY QUERY MANAGER
Query Manager direct to the appropriate tables.
Query Manager schedule the execution of the queries posed by the end user.
Query Manager stores query profiles to allow the warehouse manager to
indexes and aggregations are appropriate.
Data Warehousing - Security
Introduction
The objective data warehouse is to allow large amount of data to be easily access
Hence allowing user to extract the information about the business as a whole. But w
could be some security restrictions applied on the data which can prove an obstacle
information. If the analyst has the restricted view of data then it is impossible to ca
picture of the trends within the business .
The data from each analyst can be summarised and passed onto management w
summarise can be created. As the aggregations of summaries cannot be same as t
as a whole so It is possible to miss some information trends in the data unless som
the data as a whole.
Requirements
Adding the security will affect the performance of the data warehous e, therefore it is w
the security requirements early as possible. Adding the security after the data ware
live, is very difficult.
During the design phase of data warehouse we should keep in mind that what data
added later and what would be the impact of adding those data sources. We sho
following poss ibilities during the design phase.
Whether the new data sources will require new security and/or audit r
implemented?
Whether the new users added who have restricted access to data that is
available?
This situation arises when the future users and the data sources are not well k
situation we need to use the knowledge of business and the objective of data wareho
requirements.
Factor to Consider for Security requirements
The following are the parts that are affected by the security hence it is worth consider t
User Access
-
8/11/2019 Data Warehousing Quick Guide
50/66
4/9/2014 Data Warehousing Quick Guide
http://www.tutorialspoint.com/dwh/dwh_quick_guide.htm 50
Data Load
Data Movement
Query Generation
USER ACCESS
We need to classify the data first and then the users by what data they can access.
users are class ified according to the data, they can access.
Data Classification
The