MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

29
MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining

Transcript of MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Page 1: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

MPIIIDatabase Technologies

Relational Concepts

Data Warehouses & Marts

Queries, OLAP, Data Mining

Page 2: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Terms/Examples

• Database– a collection of related data. Usually organized according to

topics: e.g. customer info, products, transactions

• Database Management System (DBMS)– a program for creating & managing databases; ex. Oracle, MS-

Access, Sybase

DBMS - the program. Manages interaction with databases.

database - the collection of data.Created and defined to meet theneeds of the organization.

Client - makes requests of the DBMS server

request

response

Server - responds to client requests

Page 3: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

A Simple Database

• File/Table– Customers

• Field/Column– 5 shown: CUSTID, FIRST, LAST, CITY, STATE

• Record/Row– 5 shown: one for each customer

CUSTID FIRST LAST CITY STATE …2001 John Gallaugher Newton MA …2002 Abby Johnson Boston MA …2003 Warren Buffet Omaha NE …2004 Peter Lynch Rockport MA …2005 Charles Schwab San Francisco CA …

Page 4: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

FIRST LAST CITY STATE BUY/SELLSTOCK SHARES PRICE DATE TIMEJohn Gallaugher Newton MA Buy MSFT 1000 90 1/4 12/24/96 12:01 PMJohn Gallaugher Newton MA Buy INTC 2400 80 1/8 7/3/97 10:51 AMJohn Gallaugher Newton MA Sell IBM 3000 114 3/8 7/1/97 9:03 AMAbby Johnson Boston MA Sell IBM 3000 110 1/8 6/30/97 4:53 PMAbby Johnson Boston MA Sell INTC 2000 94 7/8 8/30/97 3:15 PMWarren Buffet Omaha NE Buy INTC 1500 90 3/8 7/2/97 11:27 AMWarren Buffet Omaha NE Buy IBM 1700 101 7/8 1/4/97 2:02 PMWarren Buffet Omaha NE Sell AAPL 1900 18 1/2 2/14/97 5:00 PMPeter Lynch Rockport MA Buy AAPL 2000 19 2/14/97 5:30 PMPeter Lynch Rockport MA Sell AAPL 10000 21 7/8 3/15/97 11:44 AMCharles Schwab San Francisco CA Buy MSFT 4500 101 1/8 1/15/97 12:38 AMCharles Schwab San Francisco CA Buy INTC 17000 80 1/8 7/2/97 4:53 PM

A More Complex Example

• Entry & Maintenance is complicated– redundant data exists, increases chance of error,

complicates updates/changes, takes up space

Page 5: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

CUSTID FIRST LAST CITY STATE2001 John Gallaugher Newton MA2002 Abby Johnson Boston MA2003 Warren Buffet Omaha NE2004 Peter Lynch Rockport MA2005 Charles Schwab San Francisco CA

Normalize Data - Remove Redundancy

One

Many

CUSTID BUY/SELLSTOCK SHARES PRICE DATE TIME2001 Buy MSFT 1000 90 1/4 12/24/96 12:01 PM2001 Buy INTC 2400 80 1/8 7/3/97 10:51 AM2001 Sell IBM 3000 114 3/8 7/1/97 9:03 AM2002 Sell IBM 3000 110 1/8 6/30/97 4:53 PM2002 Sell INTC 2000 94 7/8 8/30/97 3:15 PM2003 Buy INTC 1500 90 3/8 7/2/97 11:27 AM2003 Buy IBM 1700 101 7/8 1/4/97 2:02 PM2003 Sell AAPL 1900 18 1/2 2/14/97 5:00 PM2004 Buy AAPL 2000 19 2/14/97 5:30 PM2004 Sell AAPL 10000 21 7/8 3/15/97 11:44 AM2005 Buy MSFT 4500 101 1/8 1/15/97 12:38 AM2005 Buy INTC 17000 80 1/8 7/2/97 4:53 PM

Customer Table

Transaction Table

Page 6: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Key Terms• Relational DBMS

– manages databases as a collection of files/tables in which all data relationships are represented by common values in related tables (referred to as keys).

– a relational system has the flexibility to take multiple files and generate a new file from the records that meet the matching criteria (join).

• SQL - Structured Query Language– Most popular relational database standard. Includes a

language for creating & manipulating data.

Page 7: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

CUSTID FIRST LAST CITY STATE2001 John Gallaugher Newton MA2002 Abby Johnson Boston MA2003 Warren Buffet Omaha NE2004 Peter Lynch Rockport MA2005 Charles Schwab San Francisco CA

Now With More Data

One

Many

BROKID FIRST LAST …B001 Ivan Boesky …B002 Dennis Levine …B003 Michael Milken …

CUSTID BROKID BUY/SELLSTOCK SHARES PRICE DATE TIME2001 B003 Buy MSFT 1000 90 1/4 12/24/96 12:01 PM2001 B001 Buy INTC 2400 80 1/8 7/3/97 10:51 AM2001 B003 Sell IBM 3000 114 3/8 7/1/97 9:03 AM2002 B001 Sell IBM 3000 110 1/8 6/30/97 4:53 PM2002 B003 Sell INTC 2000 94 7/8 8/30/97 3:15 PM

… … … … … … … …

One

Many

Page 8: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Meta-Data

• Data that describes the characteristics of stored data• Enterprise Data Model

– consistent, cross-functional, shareable meta-data model– standardization increases flexibility & use (data to info)– facilitates the creation of data warehouses

Col. Name Length Type …CUSTID 4 Char …FIRST 10 Char …LAST 15 Char …CITY 15 Char …STATE 2 Char …… … … …

Col. Name Length Type …CUSTID 4 Char …BROKID 4 CharBUY/SELL 1 Bool …STOCK 4 Char …SHARES 8 Num …PRICE 6.2 Money …… … … …

Col. Name Length Type …BROKID 4 CharFIRST 10 Char …LAST 15 Char …… … … …

1

1

mm

Customer Table Transaction Table

Broker Table

Page 9: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Management Levels of IS

DSS

MIS

TPS

Strategic Planning

Management Control

Operational Control

Page 10: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Warehouses & Marts

• Data Warehouse– a database designed to support decision-making in an organization.

It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems.

• Data Mart– a database focused on addressing the concerns of a specific

problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses.

Page 11: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Data Warehouses & Data Marts

TPS& other

operational systems

DataWarehouse

Data Mart(Marketing)

Data Mart(Engineering)

3rd party data

= query, OLAP, mining, etc.

= operational clients

Page 12: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Differing System Demands

network traffic & processor

demands

time

network traffic & processor

demands

time

Managerial Systems

Operational Systems

Page 13: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Transform Data from TPS to Warehouse

• Consolidate data– e.g. from multiple TPS around the country/world

• “Scrub” the data– keep definitions consistent (e.g. translate part

numbers/product names if they differ per country)

• Calculate fields (decrease processor load)• Summarize fields (decrease processor load)• De-normalize data (ease of use)

Page 14: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Calculated Fields

Customer Date Stock Shares Price TotalGallaugher 3/25/98 INTC 1000 76 1/2 76,500$ Johnson 3/26/98 AAPL 2500 23 1/4 58,125$ Buffet 3/27/98 MSFT 3000 84 252,000$

Customer Service Application:Customer support personTPS - focuses on customer infoTotal is calculated on the fly

Database Query Application:Marketing managerAggregate reporting of business intelligenceTotal calculated in advance

Page 15: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Query Tools & OLAP

• Query Tools– user-lead discovery. Can return individual records or summaries.

Requests are formulated in advance (e.g. “show me all delinquent accounts in the northeast region during Q1”).

• OLAP - Online Analytical Processing– user-lead discovery. Data is explored via “drill down” into the data

by selecting variables to summarize on. Results are usually reported in a cross-tab report or graph (e.g. “show me a tabular breakdown of sales by business unit, product type, and year”).

Page 16: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

OLAP

• Online Analytical Processing. (example of cross-tab results presented below)

1. business unit

2. product type 3. year

Page 17: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Data Mining

• automated information discovery process, uncovers important patterns in existing data– can use neural networks or other approaches.

Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment.

• e.g. “What are the characteristics that identify when we are likely to lose a customer?”

Page 18: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Data Mining Uses

• Market Segmentation - e.g. Dayton Hudson

• Direct Marketing - e.g. Chase

• Market basket analysis - e.g. Wal-Mart

• Customer Churn - e.g. Fleet Bank

• Fraud Detection - e.g. Bank of America

• Cost Reduction Prospecting - e.g. Merk Medco.

Page 19: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Stupid Data-Miner Tricks

• Ad-Hoc Theories– when an oddity jumps out of the data, it’s tempting to develop

a theory for it. Sometimes findings are just statistical flukes.

• Using Too Many Variables– the more factors considered, the more likely a relationship

will be found - valid or not.

• Not Taking No for an Answer– it’s OK to stop looking if you can’t find anything. There are

no silver bullets.

Page 20: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

MPIII

Internal & External Integration

Enterprise Resource Planning (ERP)

Page 21: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Challenges Facing IS Depts.

• Y2K & Legacy Systems

• Globalization (euro, currency issues)

• Rapid Technology Advancement– e.g. Client/Server & Internet

• IS Staffing & Retention

• Changing Organizational Structures– e.g. Owens Corning

• Tighter Integration with Buyers & Suppliers

Page 22: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Legacy SystemsMany firms have limited to no integration across

geographic areasfunctional areas (v-chain)

products, plants, & business units

Inbound Operations Outbound Marketing Servicelogistics logistics & Sales

Infrastructure: general mgmt, planning, finance, ISHRM: recruiting, hiring, training, and developmentTech. Development: R&DProcurement

BuyersSuppliers

Page 23: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

External Integration

• EDI - Electronic Data Interchange– uses standard formats to pass data between disparate systems– US format - X.12, European format - UN/EDIFACT

• Cost Savings– paper order = $50 - $70– EDI order = $2.50 (VANs / private networks)– I-EDI order = less than $1 (Internet)

• XML - eXtensible Markup Language– tagging language for the web

Page 24: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

What is ERP?• ERP - Enterprise Resource Planning Software

– sometimes called Enterprise Applications, Enterprise Packages, Enterprise Suites, or Enterprise Systems

– connects all of the information which flows through a company to a single integrated set of systems

– implemented in modules which can be integrated (all at once or at a later date) e.g. Financials, Logistics, HR

– may work with a wide variety of databases, hardware, and operating systems

• Leading Vendors– SAP, Oracle, JD Edwards, Baan, Peoplesoft

Page 25: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

ERP in Action

SalesInventoryProduction

Staffing

PurchasingOrder Tracking Planning

Source: BusinessWeek Int’l, 1997

Page 26: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

The Benefits

• Internal & external integration– squeeze out waste & enable strategies

• Standard software enables - – inter-organizational systems (easier if buyers & suppliers use

the same system, e.g. petrochem. ind.)– broad selection of add-on packages (e.g. data warehouses, etc.)

• Package upgrading and new technology development is handled by vendor

• Speed of deployment

Page 27: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

The Risks

• Staff retention (e.g. Grace case)

• Tied to a single vendor

• Flexibility limited by options offered by the vendor– may inappropriately force generic processes

– may inappropriate force structure

• Complexity - particularly regarding mapping and standardizing processes across the organization.

Page 28: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Make vs. Buy

Adapted from Applegate et al., p. 61.

Make BuyComp. Adv. Will the proposed system offer

proprietary comp. adv.?Yes No

Security Is the process or data highly confidential?

Yes No

IT Competency Is IT a core competency? Yes NoTech. Skill Does the firm have sufficient

expertise with tech.?Yes No

Suitability/Fit Is a suitable partner/package available?

No Yes

Cost/Benefit Is the package cheaper than in-house dev.?

No Yes

Time Is there sufficient time to develop the system

Yes No

Page 29: MPIII Database Technologies Relational Concepts Data Warehouses & Marts Queries, OLAP, Data Mining.

Successful Deployment of ERP• Business Case

– benchmark, cost justify (e.g. unplug mainframes)

• Leadership– from the highest levels (e.g. success at Owens Corning, failure at

Westinghouse)

• Staffing– largely from business, not IT (users know the process)– ‘compensation handcuffs’ (e.g. end of deployment bonuses, training

payback agreements)– experienced consultants - check refs., clients

• Execute with proven methodologies