Rusty Barnett

24
Data Warehousing De-Mystified Presented to the Annual Georgia Oracle Users Group Conference on March 20, 2006 By Rusty Barnett

description

 

Transcript of Rusty Barnett

Page 1: Rusty Barnett

Data Warehousing De-Mystified

Presented to the AnnualGeorgia Oracle Users Group

Conference on March 20, 2006 By Rusty Barnett

Page 2: Rusty Barnett

Presented by Rusty Barnett 2

Data Warehousing De-Mystified

What do people mean when they call a grouping of data a “Data Warehouse”?

1) Large Data Volume, Loaded using an ETL Process (Extract, Transform, Load)

2) Data is Time-sensitive and Subject Oriented3) Enterprise Data Warehouse, Data Marts, ODS

(Operational Data Stores), DSS, EIS4) Dimensional Model, Third-Normal Form

Page 3: Rusty Barnett

Presented by Rusty Barnett 3

Data Warehousing De-Mystified

What do people mean when they refer to “Data Warehousing”?

1) Dimensional Data Modeling

2) ETL processing and administrating

3) Business Intelligence Reporting

4) Database managing and administrating

… As it pertains to a Data Warehouse

Page 4: Rusty Barnett

Presented by Rusty Barnett 4

Data Warehousing De-Mystified

What is a “Data Warehouse”?1) "A data warehouse is a subject-oriented,

integrated, time-variant and non-volatile collection of data in support of management's decision making process“. Bill Inmon, 1990.

2) "A data warehouse is the queryable presentation resource for an enterprise’s data". Ralph Kimball, page 19, "The Data Warehouse Lifecycle Toolkit“.

Page 5: Rusty Barnett

Presented by Rusty Barnett 5

Data Warehousing De-Mystified

Definitions of Data Warehouse on the Web:• A data warehouse is a database geared towards the business

intelligence requirements of an organization. www.oranz.co.uk/glossary_text.htm

• An information infrastructure that enables businesses to access and analyze detailed data and trends.www.adobe.com/products/vdp/glossary.html

• A collection of integrated, subject-oriented databases designed to support the DSS function. The data warehouse contains atomic data and lightly summarized data.it.csumb.edu/departments/data/glossary.html

• A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect.www.pdacortex.com/glossary.htm

Page 6: Rusty Barnett

Presented by Rusty Barnett 6

Data Warehousing De-Mystified

Definitions of Data Warehouse on the Web Continued:• The Data Warehouse is a central repository of data that provides

the MIT community with integrated, up-to-date data from various administrative systems. web.mit.edu/sapr3/docs/webdocs/glossary/glCD.html

• An information repository from which queries & analysis are made.www.pcai.com/web/glossary/pcai_d_f_glossary.html

• This vast database stores information like a data repository, but goes a step further, allowing users to access data to perform research-oriented analyses.www.payorid.com/glossary.asp

• A collection of databases combined with a flexible data extraction system.www.genpromag.com/Glossary~LETTER~D.html

Page 7: Rusty Barnett

Presented by Rusty Barnett 7

Data Warehousing De-Mystified

Definitions of Data Warehouse on the Web Continued :• Datawarehouse (database) A generic term for a system for storing,

retrieving and managing large amounts of any type of data. www2.themanualpage.org/glossary/glo_d.php3

• The department or entity charged with collecting organization-wide data, verifying its accuracy, and analyzing, managing, and distributing it throughout the organization. In organizations without a data warehouse, each department may collect, analyze, manage, and distribute the data it needs for its operations.www2.uta.edu/ssw/trainasfa/glossary.htm

• A repository made up of databases of data extracted from a variety of sources, with a view to analysis to reveal additional information.www.gbc.t-online.hu/english/bszotare2.htm

Page 8: Rusty Barnett

Presented by Rusty Barnett 8

Data Warehousing De-Mystified

Definitions of Data Warehouse on the Web Continued :• Central repository of data extracted from various sources. The

current CMSU DW is a ‘read-only’ system with extracted data from SIS+, AFINS, HRS and Enrollment Management.www.cmsu.edu/x18299.xml

• A collection of data and information from various source systems.www.gov.bc.ca/prem/popt/service_plans/srv_pln/pssg/appen_a.htm

• A subject-oriented non-volatile collection of data used to support strategic decision making. The warehouse is the central point of data integration for business intelligence. It is the source of data for data marts within an enterprise and delivers a common view of enterprise data.publib.boulder.ibm.com/tividd/td/TEDW/GC32-0744-01/en_US/HTML/insmst342.htm

Page 9: Rusty Barnett

Presented by Rusty Barnett 9

Data Warehousing De-Mystified

Definitions of Data Warehouse on the Web Continued :• A logical collection of information, gathered from many different

operational databases, that supports business analysis activities and decision-making taskswww.321site.com/greg/courses/mis1/glossary.htm

• A data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favor efficient data analysis and reporting (especially OLAP). Data warehousing is not meant for current, "live" data. en.wikipedia.org/wiki/Data_warehouse

Page 10: Rusty Barnett

Presented by Rusty Barnett 10

Data Warehousing De-Mystified

Definitions of Data Warehouse on the Web Continued :• A very large repository of data comprising nearly all of a company’s

information.www.mosaictec.com/storage/storage_terms.htm

• A large database capable of storing all the information possessed by a large organization.www.jqjacobs.net/edu/cis105/concepts/CIS105_concepts_13.html

• A database used for storing historical data, which is used for data analysis.docs.rinet.ru/O8/glossary.htm

• One or more data stores originating from prime authoritative data sources by an auditable replication process.www.data-core.com/glossary-of-terms.htm

Page 11: Rusty Barnett

Presented by Rusty Barnett 11

Data Warehousing De-Mystified

Goal of Data Warehousing:

A System that displays Large Volumes of Data as Business INFORMATION!

Page 12: Rusty Barnett

Presented by Rusty Barnett 12

Data Warehousing De-Mystified

Terms:Data Warehouse ---The term Data Warehouse was coined

by Bill Inmon in 1990 Enterprise Data Warehouse --- Accumulation of data from

all significant areas of an EnterpriseData Mart --- A subset of a data warehouse, for use by a

functional area (department) on a subjectDimensional Modeling --- A design technique that seeks to

present the data in a framework that’s intuitive, allows for high-performance access, and resistance to change

ERD --- Entity Relational Diagram

Page 13: Rusty Barnett

Presented by Rusty Barnett 13

Data Warehousing De-Mystified

Terms:Schema --- A collection of database objects, including

tables, views, indexes, and synonymsStar Schema --- A method of designing tables with a Fact

Table in the middle and Dimension Tables around it; a standard technique for designing and building multi-dimensional databases

Snowflake Schema --- Star Schema with child DimensionsStar Query --- A join between a Fact table and a number of

Dimension tables, where the Dimension tables are not joined to each other, only to the Fact table

Page 14: Rusty Barnett

Presented by Rusty Barnett 14

Data Warehousing De-Mystified

Terms:Fact Table --- Primary table in Dimensional Model that is

meant to contain measurements of the business, whose most useful columns are numeric and additive

Dimension Table --- One of the set of companion tables to a Fact Table, whose columns are usually textual and are used for constraining, grouping, ordering within queries

Grain --- The level of the data, like Weekly or YearlyMeasures --- The numeric columns in a Fact TableAudit Columns --- Columns specifying change (who,when)

Page 15: Rusty Barnett

Presented by Rusty Barnett 15

Data Warehousing De-Mystified

Terms:Conformance --- A Dimension is said to be conformed if it

can be used by more than one Fact Table and/or Data Mart and/or subject area, i.e., “shareable”

Associative (Intersection) Table --- A table that is used to associate a value from one table to a value in another table in order to reduce many-to-many relationships between tables

Staging Tables --- Tables used to “stage” data along the way toward loading data, especially useful in ETL

ETL --- Extract, Transform, and Load; also a Verb, meaning to load data into one system from data in another

Page 16: Rusty Barnett

Presented by Rusty Barnett 16

Data Warehousing De-Mystified

Data Warehousing Roles:Business Users --- People with knowledge of the business

and needs for reportingReport Writers --- People who write reports (Developers);

tools for writing reportsData Modelers --- People who create data models from

Business RequirementsETL Developers --- People who write / develop ETL

processesDBA --- People who manage and support databases

Page 17: Rusty Barnett

Presented by Rusty Barnett 17

Data Warehousing De-Mystified

Tips for Data Modelers:1) Understanding the capabilities of the ETL tool,

the RDBMS, and the Reporting tool, is just as important as understanding the needs of the business and the users requirements

2) All Data Warehouse primary keys (PK) should be single-column surrogate keys (sequence)

3) Keep Referential Integrity in the DW4) Use Associative tables to handle M:M5) Order columns in tables by their “fill” factor

Page 18: Rusty Barnett

Presented by Rusty Barnett 18

Data Warehousing De-Mystified

Tips for ETL Developers:1) Embrace the use of multiple Staging Tables2) Primarily INSERT data in each ETL process step,

especially into the Staging Tables3) Invalidate Indexes (other than Primary Keys, Unique

Keys, and Foreign Keys) before loading/updating massive amounts of data. After the data is loaded/updated, then rebuild these Indexes

4) Be careful in usage of Oracle ROWNUM5) Test new features, like 10g’s MERGE enhancements,

Pipeline Functions, Grouped Table Outer Joins, Conditional Update/Insert/Delete statements, etc.

Page 19: Rusty Barnett

Presented by Rusty Barnett 19

Data Warehousing De-Mystified

Tips for Report Developers:1) Avoid using Hints in your SQL2) Consider using Materialized Views for reporting on a

common set of pre-processed data 3) Be very careful (in fact, be hesitant) in using normal

Views in Data Warehouses 4) Consider using the SQL clauses “intersect”, “minus”,

and “union [all]” in set processing5) “JOINs are expensive” --- NOT, I/O is expensive 6) Test new features, like 10g’s Connect_by, MView and

MERGE enhancements, Pipeline Functions, Grouped Table Outer Joins, Time Series & Interrow Calc’s, etc.

Page 20: Rusty Barnett

Presented by Rusty Barnett 20

Data Warehousing De-Mystified

Tips for DBA’s:1) Use as much of each Data Block as possible

(PCTFREE, PCTUSED) 2) Set optimizer_index_cost_adj to lower than 100 3) Use ARCHIVELOG mode & RMAN for backups 4) Collect Statistics in best way for your Oracle Release 5) Test Benefits of Parallel Query and Query Re-write 6) Test new features of RDBMS --- 10g features like BFT,

new Flashback, Security, Partitioning, and RMAN features, Improved VLDB support, etc.

Page 21: Rusty Barnett

Presented by Rusty Barnett 21

Data Warehousing De-Mystified

Current Debates in Data Warehousing:1) Fresh Data versus Live Data2) Dimensional Model versus Third-Normal Form3) Is a Snowflake schema bad for performance?4) Time spent to “physicalize” the Data Model?5) Who is responsible for Data Model?6) Manage DB via Data Model? (OWB)7) When to partition and when not to?8) Acceptance of Level 3 Changing Dimensions

Page 22: Rusty Barnett

Presented by Rusty Barnett 22

Data Warehousing De-Mystified

Current Standards DW Tools:Database --- Oracle 9i / 10g

ERD Data Modeling --- ERwin (Oracle Designer)

ETL --- Informatica (OWB)

Report Writing --- Oracle Discoverer, MicroStrategy, Business Objects, Cognos

Page 23: Rusty Barnett

Presented by Rusty Barnett 23

Data Warehousing De-Mystified

Questions

And

Answer

Session

Page 24: Rusty Barnett

Presented by Rusty Barnett 24

Data Warehousing De-Mystified

The End