Data Warehousing: Kimball Model vs Inmon Model

10
Allan Peda Enterprise Application Integration Y52.3250 Prof. Aldrich Wright Allan Peda Week 3, Kimball vs Inmon Data Warehouse Designs Page 1 of 10

Transcript of Data Warehousing: Kimball Model vs Inmon Model

Page 1: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

Enterprise Application Integration

Y52.3250Prof. Aldrich Wright

Allan PedaWeek 3, Kimball vs Inmon Data Warehouse Designs

Due Date: February 17, 2007, 10PM

Page 1 of 7

Page 2: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

AssignmentDescribe a Business Problem or set of business problems that the your company

is trying to solve through a data warehouse or data mart. Compare and contrast

the Kimball and Inmon approaches to data warehouse. Pick the one that best

solves your business problem and why.

Problem StatementThe chosen business problem for this data warehouse is the determination of

enrollment trends and subject interest within a university. This information

derived from these data is of tremendous value as it will be used to develop long

term growth plans for the university including campus expansion, departmental

funding levels and potential areas for enhanced research and development

funding. The initial focal point of this from a business unit perspective will be

student registration data and library collection access records. This will define

initial data feeds for ETL processing.

Problem Resolution

The chosen design is the relational (Inmon) database design as it is considered

to be quicker to develop, supports gradual expansion over time, is robust, flexible

and scalable. These attributes are considered more important than speed of

implementation or raw performance. Kimballs multidimensional system (also

known as BUS for reasons discussed below) is considered more rigid because

well defined conforming dimension and fact tables must be selected at the outset

of system development. The fact that the BUS architecture is modeled on the

analytical requirements of the end users creates difficulties when these

requirements change.

Page 2 of 7

Page 3: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

Relational Multidimensional

High Performance

Considered Quicker To Implement

Gradual Development Supported

Robust

Flexible

Scalable Performance

A more extensive discussion of each system is developed below.

Common Attributes among all data warehouse designs

No matter which design paradigm is used to approach the development of a data

warehouse design, the basic definition of a data warehouse remains the same. It

is a subject oriented, integrated, nonvolatile and time variant collection of data in

support of management's decisions. As such it can be viewed as one type of

read intensive online analytical processing (OLAP) system which must integrate

a large amount of data derived from multiple business units. These data must be

consistent and summarized at the appropriate level of granularity, consequently

an Extract, Transform and Load (ETL) step is necessary in drawing data from the

operational (OLTP) systems.1 Both designs emphasize the importance of the

ETL step in developing cleansed data which is loaded from a common Integrated

Data Store. There must be no direct dependence on legacy or operational data.

The Multidimensional (Kimball) Approach

As noted earlier, the Kimball design is developed along the requirements of the

end user. As such the data must be denormalized and refactored using a well

defined set of end user requirements. The underlying logical architecture is not

1 This step is sometimes altered slightly with data transformation done primarily within the database. This is referred to as Extract, Load and Transform (ELT).

http://www.b-eye-network.com/blogs/linstedt/archives/2005/05/elt_and_etl_can.php

Page 3 of 7

ALP, 02/16/08,
Inmon: building the DW p 364
ALP, 02/16/08,
Inmon Building the DW, p 371
Page 4: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

relational, instead it is multidimensional (MOLAP, often referred to as a

multidimensional hypercube). It should be noted that star schemas can be

developed using tables within an RDBMS. In such cases it is possible to reduce

data redundancy and lower storage requirements by reintroducing some level of

normalization to the data. Such controlled reintroduction of normalization is

known as snowflaking of a star schema. Physical MOLAP database systems (as

opposed to the logical schema) are typically proprietary commercial off the shelf

(COTS) products, and as such there is little publicly information regarding the

actual implementation of such systems.

Kimball data warehouses are comprised of multiple subject oriented data marts

which each center on a few core fact tables. These tables contain data which are

easily aggregated. Surrounding these fact tables are multiple dimension tables,

which contain the associated characteristics of these facts (time being an almost

universal attribute for example). These dimension tables are considered to be the

entry points used to access and aggregate the associated facts. They are used

for filtering, grouping and labeling.

In order for queries to determine what correlations between facts and attributes, it

is desirable for dimension and fact tables to be shared among several subject

areas. The careful design and selection of this set of conforming fact and

dimension tables is referred to by Kimball as the BUS architecture.

Page 4 of 7

Page 5: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

The Relational (Inmon) Approach

In contrast with the architecture developed by Ralph Kimball, Bill Inmon's design

is based on a more flexible relational design, which is shaped by the enterprise

data model. The relational design is the more abstract and more flexible one,

however this flexibility is achieved at a cost, and it is generally accepted that

relational model is not optimal in terms of reporting performance requirements.

Data which are spread among several parent-child relationships will require

multiple lookups in order to fetch each record. In order to obtain acceptable

performance levels, the relational data warehouse is often systematically

denormalized in places. The following techniques are recommended by Inmon:

1. Tables may be physically merged (possibly by creating materialized views during load step).

2. Data may be refactored so they are no longer atomic; this may be accomplished via embedded arrays.

Page 5 of 7

Illustration 1: Data flow in the Kimball Data Warehouse Architecture (from The Data Warehouse Toolkit, Kimball 2nd edition,2003)

Page 6: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

3. Data which has a low probability of access may be placed in a separate tables.

4. Redundant data may be deliberately introduced to reduce the need to lookup other tables.

5. Derived data may be placed within the database. This is a special instance of data redundancy. This may be accomplished via regularly scheduled batch processes or by use of a creative index during the data loading step.

6. Most noteworthy perhaps is the fact that Inmon states that it is “patently incorrect” to enforce referential integrity within a data warehouse. As such any parent-child integrity within a data warehouse is referred to by Inmon as a relationship “artifact”.

Inmon's design also relies on star (and snowflake) joins, however these design

techniques are restricted to the summarized data presented at the data

warehouse level. In Ralph Inmon's data warehouse designs, data marts are not

linked together via a data bus of conforming fact and dimension tables, instead

the RDBMS system upstream of the data marts is considered the warehouse

proper, and it is there where relations are derived allowing queries across subject

areas (contrast his to the use of conforming dimensions and facts within Kimball's

architectures). As noted earlier Kimball's logical architectures naturally map

directly to a physical multidimensional online analytical processing (OLAP)

database system known as MOLAP, while Bill Inmon's designs are inherently

relational OLAP (ROLAP). Combinations, or hybrids of both systems are known

as HOLAP.

Page 6 of 7

Page 7: Data Warehousing: Kimball Model vs Inmon Model

Allan Peda

References

1. Inmon, W.H., Building The Data Warehouse, 4th edition. Indianapolis, Indiana: Wiley Publications, 2005.

2. Kimball, R and Ross, M. The Data Warehouse Toolkit, 2nd edition. Indianapolis, Indiana: Wiley Publications, 2003.

3. Website for The Kimball Grouphttp://www.kimballgroup.com/

4. Website for Bill Inmons Corporate Information Factory (CIF) http://www.inmoncif.com/home/

Page 7 of 7

Illustration 2: Data flow for the Inmon Relational Data Warehouse