Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek...

Post on 05-Jan-2016

213 views 1 download

Transcript of Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek...

Database Management System

Prepared by Dr. Ahmed El-Ragal

Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra

College Of Science & Technology- Khan younis Information Technology & Computer Science Dep.

Part 8Data Warehousing

Info sources for Info sources for organizationsorganizations

April 20, 2023Data Warehouse2

HR

Financial/ Accounting

ERP

CRM and

eCRM

Internet

Procurement CallCenter

Inventory

Islands of Islands of informationinformation

Applications reporting Applications reporting systems?systems?

April 20, 2023 Data Warehouse 3

Ad-Ad-hochoc

Standard ReportsStandard Reports

ParameterizedParameterizedReportsReports

Data WarehousingData Warehousing

April 20, 2023Data Warehouse4

EnterpriseEnterpriseData Data

WarehouseWarehouse

HR

Financial/ Accounting

ERP

CRM and

eCRM

Internet

Procurement CallCenter

Inventory

April 20, 2023Data Warehouse5

A Simple Definition A data warehouse is a collection of data created to support decision- making applications.

April 20, 2023Data Warehouse6

Another Definition

Data warehousing is the entire process of data extraction, transformation, and loading of data to the warehouse and the access of the data by end users and applications.

April 20, 2023Data Warehouse7

Data Warehouse-- DefinedData Warehouse-- DefinedCollection of Data in Support of Management Reporting Needs and Decision-Making Processes.

Organized by subject areas (known as data marts) and structured for query. Integrated across subject areas. Associated with a point in time, such as pay period, fiscal year, semester. Query only, that is, the data does not change.

April 20, 2023Data Warehouse8

A Data Warehouse A Data Warehouse Designer’sDesigner’s

View of the BusinessView of the Business

MARKET

P R O D U C T

TIM

E

The threeThe threekey wordskey wordsbecome the become the dimensionsdimensionsof the cube.of the cube.

The points The points inside the inside the cube store the cube store the measurementmeasurementssof the of the business –business –a combination a combination of Product, of Product, Market,Market,and Time.and Time.These points These points arearethe business the business facts.facts.

April 20, 2023Data Warehouse9

Goals of the Data WarehouseGoals of the Data Warehouse

Provide Access to various scattered Data.

Include Business Descriptions of Data.

Promote Consistency Across Subject Areas.

Assure Data Quality.

Support User Directed Analysis.

April 20, 2023Data Warehouse10

Runseverynight

ExternalExternalDataData

OperationalOperationalDataData

ExtractExtractTransformTransform

LoadLoad

The Data Loading ProcessThe Data Loading Process

April 20, 2023Data Warehouse11

ExternalExternalDataData

ExtractExtractTransformTransform

LoadLoad

Ready for AccessReady for Accessand Queryand Query

OperationalOperationalDataData

DataDataWarehouseWarehouse

BusinessBusinessDescriptionsDescriptions

The Data Loading ProcessThe Data Loading Process

Characteristics of a Data Characteristics of a Data WarehouseWarehouse

April 20, 2023Data Warehouse12

Subject oriented – organized based on use (on subject not on application).

Integrated – inconsistencies removed.Nonvolatile – stored in read-only format

(historical ).Time variant – data are normally time series.Summarized – in decision-usable format.Large volume – data sets are quite large.Non normalized – often redundant.Metadata – data about data are stored.Data sources – comes from nonintegrated

sources.

Data Warehousing Data Warehousing ArchitectureArchitecture

April 20, 2023Data Warehouse13

Data MartsData Marts

ExtractExtractTransformTransformLoadLoad

External Sources

Operational db systems

ServeServe

OLAP serversOLAP servers

Analysis

Query/ Reporting

Data Mining

April 20, 2023Data Warehouse14

Data Warehouse vs. Data MartData Warehouse vs. Data Mart

Source: adapted from Strange (1997).

Data ExtractionData Extraction

April 20, 2023Data Warehouse15

Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data).

Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems).

Increasing performed by specialized ETL software.

Reasons for “Dirty” DataReasons for “Dirty” Data

April 20, 2023Data Warehouse16

Dummy Values. Absence of Data. Multipurpose Fields. Cryptic Data. Contradicting Data. Inappropriate Use of Address Lines. Violation of Business Rules. Reused Primary Keys. Non-Unique Identifiers. Data Integration Problems.

Data CleansingData Cleansing

April 20, 2023Data Warehouse17

Source systems contain “dirty data” that must be cleansed.

ETL software contains rudimentary data cleansing capabilities.

Specialized data cleansing software is often used.

Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)

Steps in Data CleansingSteps in Data Cleansing

April 20, 2023Data Warehouse18

Parsing

Correcting

Standardizing

Matching

Consolidating

ParsingParsing

April 20, 2023Data Warehouse19

Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.

Examples include parsing the first, middle, and last name; street number and street name; and city and state.

CorrectingCorrecting

April 20, 2023Data Warehouse20

Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.

Example include replacing an incomplete address and adding a zip code.

StandardizingStandardizing

April 20, 2023Data Warehouse21

Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

Examples include adding a pre name, replacing a nickname, and using a preferred telephone.

MatchingMatching

April 20, 2023Data Warehouse22

Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

Examples include identifying similar names and addresses.

ConsolidatingConsolidating

April 20, 2023Data Warehouse23

Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

Data StagingData Staging

April 20, 2023Data Warehouse24

Often used as an interim step between data extraction and later steps.

Accumulates data from asynchronous sources using flat files, FTP sessions, or other processes.

At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse.

There is usually no end user access to the staging file.

An operational data store may be used for data staging.

Data TransformationData Transformation

April 20, 2023Data Warehouse25

Transforms the data in accordance with the business rules and standards that have been established.

Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates.

Building The WarehouseBuilding The WarehouseTransforming Data

April 20, 2023Data Warehouse26

TransformTransformChangeChange

CombineCombine

CalculateCalculate

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

reg_idreg_id246...

total_salestotal_sales17.6052.808.82

...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

reg_idreg_id246...

total_salestotal_sales17.6052.808.82

...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

priceprice.55

1.10.99...

qtyqty32489...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

reg_idreg_idIIIVVI...

total_salestotal_sales17.6052.808.82

...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

priceprice.55

1.10.99...

qtyqty32489...

total_salestotal_sales17.6052.808.82

...

buyer_firstbuyer_firstAdamSeanErin

...

buyer_lastbuyer_lastBarrChai

O’Melia...

reg_idreg_id246...

total_salestotal_sales17.6052.808.82

...

Data LoadingData Loading

April 20, 2023Data Warehouse27

Data are physically moved to the data warehouse.

The loading takes place within a “load window”.

The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for decisional activities that affect operations.

Meta DataMeta Data

April 20, 2023Data Warehouse28

Data about data.Needed by both information technology

personnel and users.IT personnel need to know data sources

and targets; database, table and column names; refresh schedules; data usage measures; etc.

Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.

Reviewed By Mr. Mahmoud Rafeek Alfarra