Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek...

29
Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis Information Technology & Computer Science Dep. Part 8 Data Warehousing

Transcript of Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek...

Page 1: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Database Management System

Prepared by Dr. Ahmed El-Ragal

Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra

College Of Science & Technology- Khan younis Information Technology & Computer Science Dep.

Part 8Data Warehousing

Page 2: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Info sources for Info sources for organizationsorganizations

April 20, 2023Data Warehouse2

HR

Financial/ Accounting

ERP

CRM and

eCRM

Internet

Procurement CallCenter

Inventory

Islands of Islands of informationinformation

Page 3: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Applications reporting Applications reporting systems?systems?

April 20, 2023 Data Warehouse 3

Ad-Ad-hochoc

Standard ReportsStandard Reports

ParameterizedParameterizedReportsReports

Page 4: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data WarehousingData Warehousing

April 20, 2023Data Warehouse4

EnterpriseEnterpriseData Data

WarehouseWarehouse

HR

Financial/ Accounting

ERP

CRM and

eCRM

Internet

Procurement CallCenter

Inventory

Page 5: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse5

A Simple Definition A data warehouse is a collection of data created to support decision- making applications.

Page 6: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse6

Another Definition

Data warehousing is the entire process of data extraction, transformation, and loading of data to the warehouse and the access of the data by end users and applications.

Page 7: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse7

Data Warehouse-- DefinedData Warehouse-- DefinedCollection of Data in Support of Management Reporting Needs and Decision-Making Processes.

Organized by subject areas (known as data marts) and structured for query. Integrated across subject areas. Associated with a point in time, such as pay period, fiscal year, semester. Query only, that is, the data does not change.

Page 8: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse8

A Data Warehouse A Data Warehouse Designer’sDesigner’s

View of the BusinessView of the Business

MARKET

P R O D U C T

TIM

E

The threeThe threekey wordskey wordsbecome the become the dimensionsdimensionsof the cube.of the cube.

The points The points inside the inside the cube store the cube store the measurementmeasurementssof the of the business –business –a combination a combination of Product, of Product, Market,Market,and Time.and Time.These points These points arearethe business the business facts.facts.

Page 9: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse9

Goals of the Data WarehouseGoals of the Data Warehouse

Provide Access to various scattered Data.

Include Business Descriptions of Data.

Promote Consistency Across Subject Areas.

Assure Data Quality.

Support User Directed Analysis.

Page 10: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse10

Runseverynight

ExternalExternalDataData

OperationalOperationalDataData

ExtractExtractTransformTransform

LoadLoad

The Data Loading ProcessThe Data Loading Process

Page 11: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse11

ExternalExternalDataData

ExtractExtractTransformTransform

LoadLoad

Ready for AccessReady for Accessand Queryand Query

OperationalOperationalDataData

DataDataWarehouseWarehouse

BusinessBusinessDescriptionsDescriptions

The Data Loading ProcessThe Data Loading Process

Page 12: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Characteristics of a Data Characteristics of a Data WarehouseWarehouse

April 20, 2023Data Warehouse12

Subject oriented – organized based on use (on subject not on application).

Integrated – inconsistencies removed.Nonvolatile – stored in read-only format

(historical ).Time variant – data are normally time series.Summarized – in decision-usable format.Large volume – data sets are quite large.Non normalized – often redundant.Metadata – data about data are stored.Data sources – comes from nonintegrated

sources.

Page 13: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data Warehousing Data Warehousing ArchitectureArchitecture

April 20, 2023Data Warehouse13

Data MartsData Marts

ExtractExtractTransformTransformLoadLoad

External Sources

Operational db systems

ServeServe

OLAP serversOLAP servers

Analysis

Query/ Reporting

Data Mining

Page 14: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

April 20, 2023Data Warehouse14

Data Warehouse vs. Data MartData Warehouse vs. Data Mart

Source: adapted from Strange (1997).

Page 15: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data ExtractionData Extraction

April 20, 2023Data Warehouse15

Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data).

Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems).

Increasing performed by specialized ETL software.

Page 16: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Reasons for “Dirty” DataReasons for “Dirty” Data

April 20, 2023Data Warehouse16

Dummy Values. Absence of Data. Multipurpose Fields. Cryptic Data. Contradicting Data. Inappropriate Use of Address Lines. Violation of Business Rules. Reused Primary Keys. Non-Unique Identifiers. Data Integration Problems.

Page 17: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data CleansingData Cleansing

April 20, 2023Data Warehouse17

Source systems contain “dirty data” that must be cleansed.

ETL software contains rudimentary data cleansing capabilities.

Specialized data cleansing software is often used.

Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)

Page 18: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Steps in Data CleansingSteps in Data Cleansing

April 20, 2023Data Warehouse18

Parsing

Correcting

Standardizing

Matching

Consolidating

Page 19: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

ParsingParsing

April 20, 2023Data Warehouse19

Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.

Examples include parsing the first, middle, and last name; street number and street name; and city and state.

Page 20: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

CorrectingCorrecting

April 20, 2023Data Warehouse20

Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.

Example include replacing an incomplete address and adding a zip code.

Page 21: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

StandardizingStandardizing

April 20, 2023Data Warehouse21

Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

Examples include adding a pre name, replacing a nickname, and using a preferred telephone.

Page 22: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

MatchingMatching

April 20, 2023Data Warehouse22

Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

Examples include identifying similar names and addresses.

Page 23: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

ConsolidatingConsolidating

April 20, 2023Data Warehouse23

Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

Page 24: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data StagingData Staging

April 20, 2023Data Warehouse24

Often used as an interim step between data extraction and later steps.

Accumulates data from asynchronous sources using flat files, FTP sessions, or other processes.

At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse.

There is usually no end user access to the staging file.

An operational data store may be used for data staging.

Page 25: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data TransformationData Transformation

April 20, 2023Data Warehouse25

Transforms the data in accordance with the business rules and standards that have been established.

Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates.

Page 26: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Building The WarehouseBuilding The WarehouseTransforming Data

April 20, 2023Data Warehouse26

TransformTransformChangeChange

CombineCombine

CalculateCalculate

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

reg_idreg_id246...

total_salestotal_sales17.6052.808.82

...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

reg_idreg_id246...

total_salestotal_sales17.6052.808.82

...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

priceprice.55

1.10.99...

qtyqty32489...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

reg_idreg_idIIIVVI...

total_salestotal_sales17.6052.808.82

...

buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...

priceprice.55

1.10.99...

qtyqty32489...

total_salestotal_sales17.6052.808.82

...

buyer_firstbuyer_firstAdamSeanErin

...

buyer_lastbuyer_lastBarrChai

O’Melia...

reg_idreg_id246...

total_salestotal_sales17.6052.808.82

...

Page 27: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Data LoadingData Loading

April 20, 2023Data Warehouse27

Data are physically moved to the data warehouse.

The loading takes place within a “load window”.

The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for decisional activities that affect operations.

Page 28: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Meta DataMeta Data

April 20, 2023Data Warehouse28

Data about data.Needed by both information technology

personnel and users.IT personnel need to know data sources

and targets; database, table and column names; refresh schedules; data usage measures; etc.

Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.

Page 29: Database Management System Prepared by Dr. Ahmed El-Ragal Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra College Of Science & Technology- Khan younis.

Reviewed By Mr. Mahmoud Rafeek Alfarra