CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

60
CIS 465 - Data Warehousing 1 Data Warehousing Version 6.0 - 04/18/2000

description

CIS Data Warehousing3 Can your database answer questions like these? By product and location, how can we regain a lost customer base? Which skill and staff levels are most likely to accept the voluntary layoff package?

Transcript of CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

Page 1: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 1

Data WarehousingVersion 6.0 - 04/18/2000

Page 2: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 2

Can your database answer questions like these?

• What is the cost of staff to break into a new line of business?

• What are the travel routes of my competition’s inventory?

• At what velocity is my competitor moving toward a common goal?

• How will a transaction on a certain date be affected by currency exchange rates?

• Is a foreign labor source likely to produce a higher quality product?

• Which 20% of the problem creates 80% of the problems?

Page 3: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 3

Can your database answer questions like these?

• By product and location, how can we regain a lost customer base?

• Which skill and staff levels are most likely to accept the voluntary layoff package?

Page 4: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 4

Data is Difficult to Manage

• Amount of data increases exponentially; past data must be kept for long times; new data are added rapidly.

• Data are scattered throughout organizations; collected by many individuals; using different methods and devices.

• Only small portions of an organization’s data are relevant for specific decisions.

• An ever-increasing amount of external data needs to be considered in making organizational decisions.

Page 5: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 5

Data is Difficult to Manage - contd.

• Raw data may be stored in different computing systems, formats, and human and computer languages.

• Legal requirements relating to data differ among countries and change frequently.

• Selecting data management tools can be difficult because of the large number of tools available.

• Data security, quality, and integrity are critical, yet easily jeopardized.

Page 6: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 6

Data and Knowledge Managementment

• Businesses do not run on data. They run on information and their knowledge of how to put that information to use successfully.

Page 7: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 7

Some Information Concepts

• Data: Unorganized facts and figures. (raw material)

• Information: Data that has been processed into a form that is meaningful to the recipient and is of real of perceived value in current or prospective actions or decisions.

• Information:– adds to a representation– corrects or confirms previous information– has “surprise” value in that it tells us something we

did not know, or could not predict.– What is a “finished product” to one, may be “raw

materials” to someone else.

Page 8: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 8

Definitions: Information vs. Knowledge

• Knowledge: a combination of instincts, ideas, rules, and procedures that guide actions and decisions.

• Helping to provide the best available knowledge to decision-making is another role of information systems

Page 9: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 9

Relationship Between Data, Information, and

Knowledge• The difference between data and information is

easy to remember.• It is often cited as the reason why systems that

collect large amounts of information fail to meet management’s information needs.

• There are many methods of converting data into information for decision making.

• Managers take action based on information about a current situation plus their accumulated knowledge. Actions taken feed the process of accumulating more knowledge (experience).

• Example: How do medical students become competent physicians?

Page 10: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 10

Relationship Between Data, Information, and

Knowledge

Page 11: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 11

Attributes of Quality Information

• Timeliness• Completeness• Conciseness• Relevance• Accuracy• Precision• Appropriateness of Form

Page 12: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 12

Special Characteristics of Information

• Usefulness - depends on combination of quality,accessibility,and presentation.

• One person’s information may be another person’s noise.

• Soft data may be as important as hard data.• Ownership of information may be hard to

maintain.• More information is not always better

(information overload).• Politics can often hide or distort

information.

Page 13: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 13

Sources of Data

• Internal Data– Data about people, products, services, processes.– Often stored in corporate data bases (e.g. Sales or

HR).– Some data may be disparate in different regions,

but accessible by networks.• Personal Data

– Individuals document expertise by creating personal data - subjective estimates. Some is kept in heads, or mental models.

– Can be store on PCs, or available on the Web.• External Data

– Many sources.

Page 14: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 14

Some Sources of Business External Data

• Federal Publications– Survey of Current

Business– Monthly Labor Review– Federal Reserve Bulletin– Employment and Earnings– Commerce Business Daily– Census Bureau

• Other– International Monetary

Fund– Moody’s– Standard & Poor’s– Advertising Age– Dialog and Lexis/Nexis

• Other-contd.– ABI/Inform– Annual Editor &

Publisher Market Guide

– Thomas Register On-line

• Indexes – Encyclopedia of

Business Information Sources

Page 15: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 15

What is a data warehouse?

• A data warehouse is a pool of data organized in a format that enables users to interpret data and convert it into useful information to gain knowledge from this interpretation.

• It is a single place that contains complete and consistent data from multiple sources.

• Data warehousing is the act of a business person extracting business value from the data stored in the data warehouse.

Page 16: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 16

Collecting Raw Data

• Is not Easy– collect in the field– elicit from people– collect manually, electronically, or by sensors.

• Data collection technology has not kept pace with advances of data storage technology.

• Data collection from external sources is not easy either.

• Bottom Line: Garbage IN, Garbage OUT - GIGO.

• Data Quality is an Important Issue.

Page 17: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 17

Data Quality Issues

• Intrinsic DQ:– accuracy, objectivity, believability, reputation.

• Accessibility DQ:– Accessibility and access security

• Contextual DQ– relevancy, value-added,

timeliness,completeness, amount of data.• Representation DQ:

– Interpretability, ease of understanding,concise representation, consistent representation

Page 18: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 18

Why Data Warehousing?

• Managers do not make decisions that are “good” or “bad”, they make decisions on the basis of good or bad information.

• Management information =– a. the right information– b. in the right form– c. at the right time.”

• Most transaction-based information systems have difficulty delivering this information.

Page 19: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 19

Why Data Warehousing - 2

• Not the right information:– data not easily accessible– meaning is subtly (or significantly

different) from the question context.

– Information is presented with too much or too little detail, covers the wrong time spans, or is in the wrong intervals.

Page 20: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 20

Why Data Warehousing - 3

• Not the right time:– Getting this information may

require the efforts of highly skilled professionals who are not generally available at the whim of business managers.

– Data comes from a variety of different systems which are resident on a variety of different technology platforms.

Page 21: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 21

Why Data Warehousing - 4

• Not the right format:– If data is extracted, merged, and

converted into a meaningful information, often it is not in a usable format.

– Users will want it loadable into a particular PC tool or spreadsheet with which he/she is familiar.

– Printouts weighing 10 pounds are not in the right format.

– a diskette with a COBOL file description is not in the right format.

Page 22: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 22

The Dilemma for Corporate IT

• How to control scarce IT resources consumed by insatiable user demand for ad-hoc reports.

• Each ad-hoc report generated by IT and analyzed by the user generates three more reports to further illuminate the insights gleamed in the first.

• Often the extract programs have few reusable components.

• The user is on a “voyage of discovery in a sea of data”.

Page 23: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 23

The Response of Corporate IT

• New methodologies: Align the IT systems with the business goals and requirements.

• These techniques concentrate on business process requirements, not decision support requirements.

• Transaction systems must be rigorously specified in advance. The are an intersection between the organization and the customer.

• These systems should not be a “voyage of discovery” for either.

Page 24: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 24

Transaction Systems vs. Analytical Support

Systems• Transaction Systems:

– Insert an order for 300 baseballs– Update this passenger’s airline reservation.– close-out accounts payable records for this vendor.– What is the current checking account balance for

this customer?• Analytical Support Systems:

– Did the sales promotion last quarter do better than the same promotion last year?

– Is the five-day moving average for this security leading or trailing actual prices?

– Which product line sells best in middle-America and how does this correlate to demographic data.

Page 25: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 25

Analytical Processing

• Analytical Processing today includes what in the past have been called:

– DSS (Decision Support Systems)– EIS (Executive Information Systems)– ESS (Executive Support Systems)

• It is an evolution of “End-User Computing”

• Placing strategic data access in the hands of decision makers aids productivity and enables them to be better decision makers.

Page 26: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 26

Key Difference: OLTP vs. OLAP

• OLTP: Processing specific functions• OLAP: providing flexibility for

undetermined analysis.

Page 27: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

A Multidimensional database

Page 28: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 28

Data for Decision Support

• The data must be integrated - requires data from many separate internal corporate databases.

• The data must be enriched - through integration with other external data.

• The data must be available - and not constrained by machine resources.

Page 29: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 29

Sources of Data

• Internal Data:– Financial Systems– Logistics Systems– Sales Systems– Production Systems– Personnel Systems– Billing Systems– Information Systems

• External Data Needs:– to recognize opportunities– to detect threats– to identify synergies

Page 30: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 30

Sources of Data - 2

• External Data Categories– Competitor Data– Economic Data– Industry Data– Credit Data– Commodity Data– Econometric Data– Psychometric Data– Meteorological Data– Demographic Data– Sales & Marketing Data

Page 31: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 31

Operational Control vs. Operational Strategy

• Data is a source not just of operational control, but of operational strategy.

• Operational strategy is an attempt to describe the need, in a competitive and turbulent market, to continually innovate and re-align strategy with time scales too short to be comprehended by strategic planning in the conventional corporate sense.

Page 32: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 32

Comparison of Control and Strategy Data:

• Operational Data:– short-lived, rapidly changing– requires record-level access– repetitive standard transactions and access patterns– updated in real-time– event-driven; process generates data

• Strategic Data:– long-living, static– data aggregated into sets (which is why warehouse

data is friendly to RDBMS).– ad-hoc queries with some periodic reporting– updated periodically with mass loads– data-driven; data governs process.

Page 33: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 33

Information Requirements by Management Level

(Source: Gorry and Scott Morton)Characteristicsof Information

OperationalControl

ManagementControl

StrategicPlanning

Source Largely Internal External

Scope Well defined,narrow

Very wide

Level ofAggregation

Detailed Aggregate

Time Horizon Historical Future

Currency Highly current Quite Old

RequiredAccuracy

High Low

Frequency of Use Very frequent Infrequent

Page 34: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 34

Dimensional Modeling

• Dimensional Modeling gives us a way to visualize data.

• The CEO’s perspective: – “We sell products in various markets,

and we measure our performance over time.”

• From the data warehouse designer’s perspective, we hear three dimensions:– We sell Products– in various Markets– and measure performance over time.

Page 35: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 35

Dimensional Modeling - contd.

• Management may be interested in examining sales figures in a certain city by product, by time period, by salesperson, and by store.

• The more dimensions involved, the more difficult it is to represent in a single table or graph.

• The ability to add and modify the dimensions used in a table or graph is often known as “slicing and dicing” the data.

Page 36: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 36

Dimensional Model of the Business

Product

Market

Ti

me

Page 37: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 37

SalesRep

SalesDistrict

SalesRegion

SalesDivision

ProductGroup

ProductLine

Product

Contact

Ship ToShipperShipType

DistrictCredit

Contract CustomerContractType

SalesOrder

CustomerLocation

ContactLocation

OrderItem

Data Dependencies Model of a Business

Page 38: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 38

Transaction Processing

• The Relation Model was full of promises for equal access to data.

• In the early 1980’s the relational model was a dream. Typical transaction rates were one per second.

• Today the SABRE system typically processes 4,000 transactions per second, with peak bursts of 13,000 per second.

• OLTP - (On-line Transaction Processing) The point is to get data “in” to the database.

Page 39: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 39

Segregating Operational and Warehouse Data

• In the past, data administrators were constantly told to build data sharing, normalization, and non-redundant corporate databases.

• Early attempts at data warehousing tried to share the data with transaction-based systems. This resulted in LONG response times for complex queries.

• The idea today is to keep the two separate.• Separate databases, and perhaps separate

DBMS products and processor platforms are used.

• Controlled and practical redundancy is better than out-of-control theoretical purity.

Page 40: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 40

Fundamental Obstacles With Traditional Systems

• Systems Integration - Disintegration grew slowly from islands of automation.– ownership, planning, economic,

organizational development issues all contribute.

• Hardware Architecture• Inconsistent Data• Data Pollution:

– Bad Application Design (semantic and syntactical differences).

– Ownership– Data Entry Conventions

Page 41: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 41

The Data Warehouse• Active, tactical, and current events flow from

the operational systems to the data warehouse to become static, strategic, and historical data.

• The data warehouse becomes a “middle ground” where a large number of disparate and incompatible “legacy systems” are tied to an equally diverse collection of end-user workstations.

• Legacy systems usually comprise a hodge-podge of assorted hardware, software, and operational systems accumulated over many decades, are by nature, incompatible with one another and unique to each organization.

Page 42: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 42

Practical Facts About the Warehouse

• The chances are remote that any single vendor will be able to develop a product that can interface with all “legacy systems” painlessly and “seamlessly” and at the same time, combine data from modern platforms and data external to the organization.

• Instead warehouse product vendors develop specialized capabilities to work with various environments.

Page 43: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 43

Typical Dimensional Model

Time_keyday-of-weekmonthquarteryearholiday_flag

Time_keyproduct_keystore_keydollars_soldunits_solddollars_cost

Product_keydescriptionbrandcategory

Store_keystore_nameaddressfloor_plan_type

Sales FactTime Dimension

Product Dimension

Store Dimension

Page 44: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 44

Fact Table

• Fact Table is where numerical measurements of the business are stored.

• Each measurement is taken at the intersection of all the dimensions.

• The “best” facts are numeric, continuously valued and additive.

• For every query made against the fact table may use hundreds of thousands of individual records to construct an answer set.

Page 45: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 45

Dimension Tables

• Dimension tables are where textual descriptions of the business are stored.

• Each textual description helps to describe a member of the dimension.

• Example: each member in the product dimension is a specific product. The product dimension database has many attributes to describe the product. A key role of the dimension table attribute is to serve as the source of constraints in a query.

Page 46: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 46

Example

BrandAxonFramisWidgetZapper

Dollar Sales780104421395

Unit Sales26350944439

Page 47: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 47

Example Query

• Find all product brands that were sold in the first quarter of 1995 and present the total dollar sales as well as the number of units.

• Brand is a collection of individual products.

• To construct:– A. Drag attribute brand from product dimension.

Place as Row Header.– B. Drag Dollar Sales and Units Sold from the Fact

Table, and place to the right of the Brand row header.– C. Specify row constraint “1st Q 1995” on the

quarter attribute in the Time Dimension Table.

Page 48: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 48

Multidimensionality

• Examples of dimensions– products, salespeople, market segments,

business units, geographical locations, distribution channels, country, industry

• Examples of Facts or Measures:– money, sales volume, head count, inventory,

profit, actual vs. forecast.• Examples of Time:

– daily, weekly, monthly, quarterly, yearly

Page 49: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 49

Components of a Data Warehouse - 1

• Acquisition - The first component handles acquisition of data from legacy systems and outside sources.

• Data is identified, copied, formatted, and prepared for loading into a warehouse.

• Vendors provide tools for extraction and preparation.

Page 50: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 50

Components of a Data Warehouse - 2

• Storage Area - The second component is the storage area managed by relational databases, multi-dimensional databases, specialized hardware - symmetric multiprocessor (SMP) or massively parallel processors (MPP) machines - or by software.

• The storage component hold the data so that many different data mining, executive information and decision support systems can make use of it effectively.

Page 51: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 51

Components of a Data Warehouse - 3

• Access - The third component of the warehouse is the access area.

• Different end-user PCs and workstations draw data from the warehouse with the help of multi-dimensional analysis tools, neural networks, data discover tools, or analysis tools.

• These “smart” data-mining tools are the driving force behind the data warehouse concept.

• What good is it to store all the information without some way to understand it in new and different ways.

Page 52: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 52

Data Warehouse Access Tools

• Intelligent Agents and Agencies - tools work and think for user.

• Query Facilities and Managed Query environments.

• Statistical Analysis - One of the biggest surprises in the data warehousing marketplace is the resurgence of interest in traditional statistical analysis, and the concomitant resurrection of the popularity to products like SAS and SPSS.

Page 53: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 53

Data Warehouse Access Tools - 2

• Data Discovery - A large class of tools formerly classified as decision support, artificial intelligence and expert systems. They now make use of neural networks, fuzzy logic, decision trees, and other tools from advanced mathematics to allow a user to “sift” through massive amounts of raw data to “discover” new, interesting, insightful, and in many cases useful things about the organization, its operations, and its markets.

• Currently there are nearly 60 different data discovery tools/products on the market.

Page 54: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 54

Data Warehouse Access Tools - 3

• OLAP - On-line Analytical Processing often uses multi-dimensional spreadsheet tools allowing users to look at information from many different angles.

• Users are able to “slice and dice” reports and to look at the same kinds of information at different levels at the same time.

• Typical OLAP application might allow a product manager to view sales figures for a given product at the national level, see them broken down by division, drill down to see territories within a division, check sales numbers for each store within a territory, and then compare them against sales of stores from another territory.

Page 55: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 55

Data Warehouse Access Tools - 4

• Data Visualization - These tools turn ugly, boring numbers into exciting visual presentations.

• These tools bring graphical representation to new heights. Example: Geographical information systems turn data about stores, individuals, or anything else into compelling, easy to understand, dynamic maps.

• PC-based Geographic Information systems have the ability to display spatial occurrences and the relationship between and among geographically specific variables.

Page 56: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 56

Developing the Data Warehouse

• The most expensive warehousing ventures involve major new hardware acquisitions and significant investments in training, analysis, and systems development costs.

• Typical startup projects allocate 60% of budget for hardware and software for creation of a powerful storage component. 30% on data mining and acquisition tools.

• Budgeting for Systems Analysis and Development has 50% of budget on acquisition capabilities, 30% fund user solutions, 20% creation of databases in the storage component.

Page 57: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 57

Developing the Data Warehouse

• Clarify what you want to do with the Warehouse - How Will It be Used.

• Scrutinize the offerings of vendors and systems integrators. Make sure you understand which functions they provide, and which you must build.

• Most successful projects start as small, tightly defined tactical systems to solve pressing business needs, and develop into larger systems over time.

Page 58: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 58

DW Summary: Key Concepts

• The DW is a “collection of integrated, subject-oriented databases designed to support the decision support function where each unit of data is non-volatile and relevant to some moment in time: (W.H. Inmon, 1992).

• Implicit Assumptions:– physically separate from operational systems– hold aggregated data and transactional

(atomic) data for management separate from those used for OLTP.

Page 59: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 59

DW Summary: Characteristics

• Subject-orientation• integrated• non-volatile (i.e. not updated)• time variant (kept for long periods, for

forecasting and trend analysis)• summarized• large volume• not normalized• metadata• data sources

Page 60: CIS 465 - Data Warehousing1 Data Warehousing Version 6.0 - 04/18/2000.

CIS 465 - Data Warehousing 60

Data Mining