ETL Testing

32
Data Warehouse Testing

description

learn bbasics

Transcript of ETL Testing

Data Warehouse TestingAgendaWhat is Operational Database/System ?Introduction of Data warehouseData warehouse ArchitectureData Extraction, Data Transformation ,Data Loading & Data MiningCharacteristics of data warehouseWhat is OLTPWhat is OLAPDifference between OLTP & OLAP What is DSS

2Operational Database/SystemAn operational database, as the name implies, is the database that is currently and progressive in use capturing real time data and supplying data for real time computations and other analyzing processes.

The operational database is the source of data for the data warehouse. It contains detailed data used to run the day-to-day operations of the business.

The data continually changes as updates are made, and reflect the current value of the last transaction.An operational database contains enterprise data which are up to date and modifiable.

Operational Database/SystemIn an enterprise data management system, an operational database could be said to be an opposite counterpart of a decision support database which contain non-modifiable data that are extracted for the purpose of statistical analysis

For example, an operational database is the one which used for taking order and fulfilling them in a store whether it is a traditional store or an online store.

An operational database is used for keeping track of payments and inventory. It takes information and amounts from credit cards and accountants use the operational database because it must balance up to the last penny.

Introduction of Data warehouseA data warehouse is basically a storage area where all an organization's information or data is stored and managed in a manner that will allow all users in the organization to use that data in their decision-making process.

It is an Information Warehouse which

Collects data from various operational data sources.Integrates data into a logical business model.Stores data in understandable and easy accessible way. Delivers information to Decision Makers across organization through Various Reports & Analysis tools.

Ideally a Data Warehouse is once again a Database

5High level Data warehouse Architecture

Data warehouse Architecture

7Data Flow diagram

8Data Extraction: Data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing.

Data Transformation: In transforming the data, the following tasks may involve. Applying business rules (for example calculating new measures and dimensions) Cleaning (for example Mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc) Filtering (for example selecting only certain columns to load), Splitting a column into multiple columns and vice versa

9Joining together data from multiple sources (for example lookup, merge) Transposing rows and columns Applying any kind of simple or complex data validation (for example if the first three columns in a row are empty then reject the row from processing)

10Data Loading: Loading data into the data warehouse. End users directly access data derived from several source systems through the data warehouse OLAP (Online Analytical Processing) are being used aggressively by organizations to discover valuable business trends from data marts and data warehouses.11Data Mining: Data mining, the extraction of hidden predictive information from large databases, is the process of analyzing data from different perspectives and summarizing it into useful information Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.12Characteristics of a Data WarehouseA Data Warehouse is a subject oriented, integrated, time variant and nonvolatile collection of data in support of management's decisions. Subject oriented Data:

In operational systems data is stored by individual applications. Data sets have to provide data for the specific applications to perform the specific functions efficiently. Therefore data sets for each application need to be organized around that specific application.

In Data warehouses data is not stored by operational applications, but by Business subjects. Business subjects differ from enterprise to enterprise and they are critical for the enterprise.

13Characteristics of a Data Warehouse Subject Oriented Subject Oriented Data is stored subject wise Independent of Application OperationalApplication oriented Data is dependant on the Application

Data WarehouseCustomerProductsLocationTimeSalesData Warehouse stores data subject wiseIntegrated Data:

All the relevant data from various applications must pull together for proper decision making. The data in the data warehouse comes from several operational systems. Sources data are in different databases, files and data segments.

Data inconsistencies are removed and process of transformation, consolidation and integration of the source data are followed before the data is stored in a data warehouse. 15Characteristics of a Data Warehouse Integrated Data Warehouse stores One Version of truthOperationalData WarehouseDepartmental Within a department

Integrated Data is integrated across Enterprise One version of truth FinanceSalesProcurementNonVolatile Data:

The data in the data warehouse is primarily for query and analysis and not intended to run the day-to-day business. The data in a data warehouse is not as volatile as the data in an operational database is.

Time-variant Data:

All data in the data warehouse is identified with a particular time period. The time-variant nature of the data in a data warehouse Allows for analysis of the past Related information to the present Enables forecasts for the future

17Characteristics of a Data Warehouse Non-volatile OperationalData Warehousereplacechangeinsertchangeinsertdeleteloadread only accessData Warehouse Is Relatively Static In NatureCharacteristics of a Data Warehouse Time Variant OperationalData WarehouseCurrent Value data time horizon : 60-90 days

Snapshot data time horizon : 5-10 yearsdata warehouse stores historical dataData Warehouse Typically Spans Across TimeWhat is an OLTP System?

OLTP is a class of program that facilitates and manages transaction-oriented applications, typically for data entry and retrieval.It is also referred to computer processing in which the computer responds immediately to user requestsIt is designed for catering to processing of large numbers of concurrent usersApplications which use OLTP includes Electronic banking E-commerce Order processing

OLTP

OLAPOLAP stands for On-Line Analytical Processing. OLAP has been growing in popularity due to the increase in data volumes and the recognition of the business value of analytics. Until the mid-nineties, performing OLAP analysis was an extremely costly process mainly restricted to larger organizations. OLAP allows business users to slice and dice data at will. Normally data in an organization is distributed in multiple data sources and are incompatible with each other.

22Part of the OLAP implementation process involves extracting data from the various data repositories and making them compatible. OLAPs are designed to give an overview analysis of what happened.OLAP provides a historical view of data, although useful when used by itself, OLAP analysis becomes truly powerful when combined with predictive analysis from Data MiningWhat is an OLAP System?It is an approach to quickly provide the answer to analytical queries that are dimensional in natureDatabases configured for OLAP employ a multidimensional data model, allowing for complex, analytical and ad-hoc queries with a rapid execution time.Applications which use OLTP includes Business reporting for sales Budgeting and forecasting Financial Report

OLAP TOOLSMicro Strategy, Cognos, Business Objects and SSASData Warehouse (OLAP)A Subject oriented, Integrated, Non-volatile, Time-variant data store containing detailed and aggregate corporate dataData stored for longer duration of timeCan I see the top selling five products region wise in the last 2 years ?It is a read only database and data is always inserted but not modifiedData from multiple OLTP sources is integrated across the Enterprise Operational Systems (OLTP)A Application oriented, departmental, volatile, current valued data store containing only detailed raw corporate dataData stored only for current period. Old Data is either archived or moved to Data WarehouseWhat products to be shipped to customer as per the order ?Identical queries may give different results at different times. Data is only for the department and used in the specific applicationAn ExampleCurrent /Recent Information

Historical Information

Is this medicine available in stock ?

Has the incidence of Tuberculosis increased in last 5 years in Southern region ?OLTPOLAPOLTP vs. OLAPOLTPOLAPSource of dataOLTPs are the original source of the data.(Operational data) OLAP data comes from the various OLTP Databases (Consolidation )Purpose of DataTo control and run fundamental business tasks To help with planning, problem solving, and decision support Inserts and UpdatesShort and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data QueriesRelatively standardized and simple queries and returning relatively few records Often complex queries involving aggregations Processing SpeedTypically very fast Depends on the amount of data involved. Batch data refreshes and complex queries may take many hours. (will use Indexes)Database DesignHighly normalized with many tables Typically de-normalized with fewer tables; use of Star or SnowflakeBackup & RecoveryOperational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method What is Decision Support System?Enterprises are recognizing information as a strategic part of their business.Data is looked as an assetTo optimize business process and deliver benefits to the bottom line.To gain insight from their data for making more tactical decisions like what, where, when, How?DSS

Q & A

Thank You