CISB594 – Business Intelligence

33
CISB594 – Business CISB594 – Business Intelligence Intelligence Data Warehousing Data Warehousing Part I Part I

description

CISB594 – Business Intelligence. Data Warehousing Part I. Reference. Materials used in this presentation are extracted mainly from the following texts, unless stated otherwise. Objectives. At the end of this lecture, you should be able to: - PowerPoint PPT Presentation

Transcript of CISB594 – Business Intelligence

Page 1: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data WarehousingData WarehousingPart IPart I

Page 2: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ReferenceReference• Materials used in this presentation are extracted mainly from

the following texts, unless stated otherwise.

Page 3: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ObjectivesObjectives

At the end of this lecture, you should be able to:• Understand the basic definitions and concepts of data

warehouses• Understand how a data warehouse differs from a database• Describe the characteristics of data warehouse• Describe data warehouse process overview• Describe the different types of data warehouse architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 4: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data WarehouseData Warehouse• A physical repository where relational data are specially

organized to provide enterprise-wide, cleansed data in a standardized format

• “The data warehouse is a collection of integrated, subject-oriented databases design to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”

• A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis . (Wikipedia)

• In your own words?

Page 5: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Characteristics of data Characteristics of data warehousing warehousing

Main• Subject oriented - Subject oriented - Data organized by detailed subject, Data organized by detailed subject,

containing only information relevant for decision support, containing only information relevant for decision support, unlike operational database which are product orientedunlike operational database which are product oriented

• Integrated – Integrated – must place data from different sources into a must place data from different sources into a consistent format, to do so they must deal with naming consistent format, to do so they must deal with naming conflict and discrepancies conflict and discrepancies

• Time variant (time series) - Time variant (time series) - maintains historical data. Data maintains historical data. Data for analysis from multiple sources contain multiple time for analysis from multiple sources contain multiple time pointspoints

• Nonvolatile - Nonvolatile - after data are entered into a data warehouse, after data are entered into a data warehouse, users cannot change or update the data.users cannot change or update the data.

Page 6: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Characteristics of data Characteristics of data warehousing warehousing

Additional• Relational/multidimensional • Client/server • Real-time • Include metadata

Page 7: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

How is a data warehouse different How is a data warehouse different from a database?from a database?

• Technically a data warehouse is a database, with certain Technically a data warehouse is a database, with certain characteristics to facilitate its role in decision support. characteristics to facilitate its role in decision support.

• However, it is an “integrated, time-variant, non volatile, subject-However, it is an “integrated, time-variant, non volatile, subject-oriented repository of detail and summary data used for oriented repository of detail and summary data used for decision support and business analytics within an organi decision support and business analytics within an organi zation.” - These characteristics, are not necessarily true of zation.” - These characteristics, are not necessarily true of databases in general.databases in general.

• As a practical matter most databases are highly normalized, in As a practical matter most databases are highly normalized, in part to avoid update anomalies. part to avoid update anomalies.

• Data warehouses are often denormalized for performance Data warehouses are often denormalized for performance reasons. This is acceptable because their content is never reasons. This is acceptable because their content is never updated, just added to. (Historical data are static)updated, just added to. (Historical data are static)

Page 8: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - ConceptData Warehousing - ConceptData mart

– Smaller and focuses on a particular subject or department. – It is a subset of data warehouse/departmental data

warehouse– A data mart is a smaller DW designed around one problem,

organizational function, topic, or other focus area.Can be Dependent data mart

– A subset that is created directly from a data warehouse – Ensures that the end user is viewing the same version of the

data that are accessed by all other data warehouse usersOr Independent data mart

– A small data warehouse designed for a strategic business unit or a department

Page 9: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - ConceptData Warehousing - Concept• Operational data stores (ODS)

– It is a type of database often used as an interim area for a data warehouse, especially for customer information files .

– Use for short term decisions rather than medium and long term

– Similar to short term memory, stores only recent information

Page 10: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - ConceptData Warehousing - Concept• Oper marts

An operational data mart. An oper mart is a small-scale data mart typically used by a single department or functional area in an organization

Page 11: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - ConceptData Warehousing - Concept• Enterprise data warehouse (EDW)

– A large scale data warehouse used across the enterprise for decision support

– Used to provide data for many types of DSS, including CRM, supply chain management, BPM, KMS etc

• Metadata – Data about data. In a data warehouse, metadata describe

the contents of a data warehouse and the manner of its use

Page 12: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing Data Warehousing Process OverviewProcess Overview

• Organizations continuously collect data, information, and knowledge at an increasingly accelerated rate and store them in computerized systems

• The number of users needing to access the information continues to increase as a result of improved reliability and availability of network access, especially the Internet

• Creating of data warehouse involves the following:– Data are imported from various internal and external resources– Cleansed and organized to suit the organization’s needs.– Data marts can be loaded for specific department/area (alternatively data marts are created first and later integrated into EDW)

Page 13: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing Data Warehousing Process OverviewProcess Overview

The data warehousing process consists of the following steps:1. Data are imported from various internal and external sources2. Data are cleansed and organized consistently with the organization’s

needs3a. Data are loaded into the enterprise data warehouse4a.If desired, data marts are created as subsets of the EDW—or—3b.Data are loaded into data marts4b.The data marts are consolidated into the EDW5. Analyses are performed as needed

Page 14: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - Process Data Warehousing - Process OverviewOverview

The major components of a data warehousing process • Data sourcesData sources. Data are sourced from operational systems and possibly from

external data sources.• Data extractionData extraction. Data are extracted using custom-written or commercial

software called ETL.• Data loadingData loading. Data are loaded into a staging area, where they are

transformed and cleansed. The data are then ready to load into the data warehouse.

• Comprehensive databaseComprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information.

• MetadataMetadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search.

• Middleware toolsMiddleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.

Page 15: CISB594 – Business Intelligence

Data Warehousing - Process Overview Data Warehousing - Process Overview

Page 16: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures • There are several basic architectures for data warehousing• To distinguished the architectures data warehouse is divided

into three parts:• The data warehouse itself• Data acquisition (back-end) software, which extracts data

from legacy systems and external sources, consolidates and loads into the data warehouse

• Client (front-end) software, which allows users access and analyze data from the warehouse

Page 17: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Three-tier DW architectureThree-tier DW architecture• 1st tier : operational systems contain data and software for

data acquisition• 2nd tier : the data warehouse• 3rd tier : DSS/BI/BA engines• Data from data warehouse are processed and deposited in

multidimensional database and organized for easy analysis and presentation

• Advantage: its separation of the functions of the data warehouse, which eliminates resource constraints and makes it easy to create data marts

Page 18: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

Tier 2:Application server

Tier 1:Client workstation

Tier 3:Database server

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 19: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Two-tier DW architectureTwo-tier DW architecture• 1st tier : operational systems contain data and software for

data acquisition (i.e the server)• 2nd tier : DSS/BI/BA engines and the data • DSS engines run on the same hardware platform as the data

warehouse, hence more economical• Advantage: economical • Disadvantage: performance problem for large data warehouse

with data intensive applications for decision support

Page 20: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

Tier 1:Client workstation

Tier 2:Application & database server

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 21: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Web-based DW architectureWeb-based DW architecture• Data warehousing and the Internet are two key technologies that

offer important solutions for managing corporate data• The integration of these two produced Web-based data warehousing• On the client side, the user needs an Internet connection and a Web

browser using GUI• The Internet/Intranet/Extranet is the communication medium

between client and servers• On the server side, a Web server is used to manage the flow of info

between client and server• Advantage: ease of access, platform independence, lower cost• Disadvantage: server capacity must be well planned carefully, page

loading speed

Page 22: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 23: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 24: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 25: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 26: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 27: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 28: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 29: CISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 30: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures • Issues to consider when deciding which architecture to use:

– Which database management system (DBMS) should be used? Most are built using RDBMS (Oracle, SQL server, DB2 are commonly used) Each supports client/server and Web-based architecture

– Will parallel processing and/or partitioning be used? Parallel processing enables multiple CPUs to process data warehouse requests simultaneously and provide scalability. Partitioning will split into smaller tables for access effeciency

Page 31: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures • Issues to consider when deciding which architecture to use:

– Will data migration tools be used to load the data warehouse?

– What tools will be used to support data retrieval and analysis?

Page 32: CISB594 – Business Intelligence

Data Warehousing Architectures Data Warehousing Architectures

1. Information interdependence between organizational units

2. Upper management’s information needs

3. Urgency of need for a data warehouse

4. Nature of end-user tasks

1. 5. Constraints on resources 2. 6. Strategic view of the data

warehouse prior to implementation

3. 7. Compatibility with existing systems

4. 8. Perceived ability of the in-house IT staff

5. 9. Technical issues6. 10. Social/political factors

Ten factors that potentially affect the architecture selection decision:

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 33: CISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Now ask if ..Now ask if ..

You are able to:• Understand the basic definitions and concepts of data

warehouses• Understand how a data warehouse differs from a database• Describe the characteristics of data warehouse• Describe data warehouse process overview• Describe the different types of data warehouse architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence