Data War Housing
-
Upload
biswadipsaha -
Category
Documents
-
view
219 -
download
0
Transcript of Data War Housing
-
8/8/2019 Data War Housing
1/12
Data Warehousing
Venkataraj Jayaraj
-
8/8/2019 Data War Housing
2/12
Venkataraj Jayaraj
Data Warehousing
-
8/8/2019 Data War Housing
3/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential3
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Normal Reporting Architecture
Source
Reports
Reports
Reports
Source
Reports
Reports
Reports
From a Source the reports are generated directly without any transformations.
Benefits:
Represent current data
Simple and easy to design and generates the reports
Drawbacks
No historical data May not be useful in decision making process.
Data Warehouse Architecture
Source
Analysis
Reporting
Data Mining
Staging
Area
Data
Warehouse
Data Mart
Metadata
Raw
Data
Summary
Data
Oracle
Teradata
DB2
SQL Server
-
8/8/2019 Data War Housing
4/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential4
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Benefits:
Performance
Report generation simplified
Contain history
Drawbacks:
No current data
Administration overhead
Source:
Its a Database where from extract the data. Ex: Oracle, Teradata,Sybase,DB2
Staging area:
Its a temporary storage area used for the process of data
Meta Data:
Data about the data OrDescription of the data.
Data mart
A Data mart is nothing but a Data warehouse but for specific domain
A Data mart can be divided into two types:
Independent Data mart
Dependent Data mart
-
8/8/2019 Data War Housing
5/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential5
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Independent Data mart
SourceAnalysis
Reporting
Data Mining
StagingArea
DataWarehouse
Data Mart
Metadata
Raw
Data
Summary
Data
Oracle
Teradata
DB2
SQL Server
Independent Data mart Architecture
SourceAnalysis
Reporting
Data Mining
StagingArea
DataWarehouse
Data Mart
Metadata
Raw
Data
Summary
Data
Oracle
Teradata
DB2
SQL Server
Independent Data mart Architecture
Such Data marts extract the data from source databases directly and these Data marts are
merged into Data warehouse.
Advantages:
Maximum utilization of resources
Hardware ,Software,Manpower
Easy maintains
Risk of failure is reduced
Disadvantages:
Total cost of development is very high
Integration problem
This approach is good for: Large organizations
-
8/8/2019 Data War Housing
6/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential6
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Dependent Data mart
Dependent Data mart Architecture
SourceAnalysis
Reporting
Data Mining
StagingArea
Data
Warehouse
Data Mart
Metadata
Raw
Data
Summary
Data
Dependent Data mart Architecture
SourceAnalysis
Reporting
Data Mining
StagingArea
Data
Warehouse
Data Mart
Metadata
Raw
Data
Summary
Data
Such Data mart extract data from Data warehouse
Advantages:
Total cost & time of development is very low
No integration problem
Disadvantages:
Cant use the full resources.
This approach is good for:
Small & medium sized organization
new organization
What are Data Warehouses?
Data warehouses store large volumes of data which are frequently used by DSS.
It is maintained separately from the organizations operational databases.
Data warehouses are relatively static with only infrequent updates.
A data warehouse is a stand-alone repository of information, integrated from several, possibly
heterogeneous operational databases.
-
8/8/2019 Data War Housing
7/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential7
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Data Warehousing
Is the enabling technology that facilitates improved business decision-making.Its a process,
not a product
A technique for assembling and managing a wide variety of data from multiple operational
systems for decision support and analytical processing.
Data Warehouse is a
Subject-Oriented- Integrated - Time-Variant- Non-volatile
collection of data in support of managements decision
Subject Oriented Analysis
SalesSales
CustomersCustomers
ProductsProducts
Entry
Sales Rep
Quantity Sold
Prod Number
Date
Customer Name
Product Description
Unit Price
Mail Address
Process Oriented Subject Oriented
Transactional Storage Data Warehouse Storage
-
8/8/2019 Data War Housing
8/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential8
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Integration of Data
Data Warehouse StorageTransactional Storage
Appl. A - M, F
Appl. B - 1, 0
Appl. C - X, Y
Appl. A - pipeline cm.
Appl. B - pipeline inches
Appl. C - pipeline mcf
Appl. A - balance dec(13,2)
Appl. B - balance PIC
9(9)V99
Appl. C - balance float
Appl. A - date (Julian)
Appl. B - date (yymmdd)
Appl. C - date (absolute)
M, F
pipeline cm
balance dec(13, 2)
date (Julian)
Encoding
Unit of
Attributes
Physical
Attributes
Data
Consistency
Volatility of Data
Mass Load / Access of DataRecord-by-Record Data
Manipulation
Insert
Access
Change
Delete
Change
Volatile Non-Volatile
Data Warehouse StorageTransactional Storage
Load
-
8/8/2019 Data War Housing
9/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential9
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
To Data warehouse structure we can use Dimensional Modeling.
1. Measurable Data (Measures)
2. Dimension Data (Dimension)
Measurable Data
Those numeric data that can be used in mathematical operations and can be summarized and
aggregated.Ex: net profit
Measurable data is required to evaluate the performance of a person, object etc for example
Net profit of a company can be used & evaluate company performance.
Measurable data are analyzed from different angles referred as dimension.At least two dimension are required to evaluate a measure(s)
Dimension Data
An angle to evaluates measures are referred as dimension.
A Dimension can be collection of sub-dimension referred as levels.
These sub-dimensions with in a dimension. We arranged in hierarchical relation
It means two sub-dimension can not be at the same level.
Types of schemas
Star Schema
Starflake schema
Snow flake schema
Star schema
Measurable data in center surrounded by different dimensions
A dimension will have only one level , so these in no hierarchy.
No relation should be defined between two dimension.
Combination of measures with related dimensions is referred as cube.
-
8/8/2019 Data War Housing
10/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential10
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Collection of measures at database level becomes table ( referred as fact table )
Levels ( sub dimensions) with in a dimension also become a table at database level( referred
as dimension table)
Database term
Schema
-----
Table (dimension table)
Table (fact table)
Constraint
Database
Columns
Data ware term
Cube
Dimension
Level
Measure
Relation/hierarchy
Data ware/data mart
Attributes
Starflake schema
Same as star schema but the cube will have at least one dimension with Two / more levels in
single hierarchy.
Snowflake schema
Same use starflake schema but the cube will have at least one dimension with two/more
levels under at least Two hierarchy.
ETL
Extract, Transform, and Load (ETL) is a process in data warehousing that involves
extracting data from outside sources, transforming it to fit business needs (which can include
quality levels), and ultimately loading it into the end target, i.e. the data warehouse.ETL process can be created using almost any programming language, creating them from
scratch is quite complex. ETL tools available to help in the creation of ETL processes.
A good ETL tool must be able to communicate with the many different relational databases
and read the various file formats used throughout an organization.
-
8/8/2019 Data War Housing
11/12
Copyright 2007 by Tata Consultancy Services. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted
in any form or by any means electronic, mechanical, photocopying, recording, or otherwise without the permission of Tata Consultancy Services.
TCS Confidential11
Data Warehousing
Confidential & Proprietary
Copyright 2008 The Nielsen Company
Some of the ETL Tools available in the Market are:
Ab Initio
Apatar
BusinessObjects Data Integrator
Clover.ETL
DMExpress
Data Junction
Data Transformation Services
IBM WebSphere DataStage
Informatica
LogiXMLPentaho
Pervasive Data Integrator
RODIN Data Asset Management
SQL Server Integration Services
Scriptella
Sprog (software)
Sunopsis
Talend Open Studio
-
8/8/2019 Data War Housing
12/12