Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

23
June 16, 2022 Data Mining: Concepts and Tec hniques 1 Data Mining: Concepts and Techniques — Slides for Textbook — Appendix A ©Jiawei Han and Micheline Kamber Slides contributed by Jian Pei ([email protected]) Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj

Transcript of Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

Page 1: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

1

Data Mining: Concepts and Techniques

— Slides for Textbook —

— Appendix A —

©Jiawei Han and Micheline Kamber

Slides contributed by Jian Pei ([email protected])

Department of Computer Science

University of Illinois at Urbana-Champaign

www.cs.uiuc.edu/~hanj

Page 2: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

2

Appendix A: An Introduction to Microsoft’s OLE OLDB for Data Mining

Introduction

Overview and design philosophy

Basic components

Data set components

Data mining models

Operations on data model

Concluding remarks

Page 3: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

3

Why OLE DB for Data Mining?

Industry standard is critical for data mining development, usage, interoperability, and exchange

OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP

Building mining applications over relational databases is nontrivial Need different customized data mining algorithms

and methods Significant work on the part of application

builders Goal: ease the burden of developing mining

applications in large relational databases

Page 4: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

4

Motivation of OLE DB for DM

Facilitate deployment of data mining models Generating data mining models Store, maintain and refresh models as data is

updated Programmatically use the model on other data

set Browse models

Enable enterprise application developers to participate in building data mining solutions

Page 5: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

5

Features of OLE DB for DM

Independent of provider or software Not specialized to any specific mining

model Structured to cater to all well-known

mining models Part of upcoming release of Microsoft SQL

Server 2000

Page 6: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

6

Overview

Core relational engine

exposes OLE DB in a

language-based API

Analysis server exposes

OLE DB OLAP and OLE DB

DM

Maintain SQL metaphor

Reuse existing notionsRDB engine

OLE DB

Analysis Server

OLE DB OLAP/DM

Data miningapplications

Page 7: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

7

Key Operations to Support Data Mining Models

Define a mining model Attributes to be predicted Attributes to be used for prediction Algorithm used to build the model

Populate a mining model from training data

Predict attributes for new data Browse a mining model fro reporting and

visualization

Page 8: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

8

DMM As Analogous to A Table in SQL

Create a data mining module object CREATE MINING MODEL [model_name]

Insert training data into the model and train it INSERT INTO [model_name]

Use the data mining model SELECT relation_name.[id], [model_name].

[predict_attr] consult DMM content in order to make predictions and

browse statistics obtained by the model Using DELETE to empty/reset Predictions on datasets: prediction join between a model

and a data set (tables) Deploy DMM by just writing SQL queries!

Page 9: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

9

Two Basic Components

Cases/caseset: input data A table or nested tables (for hierarchical data)

Data mining model (DMM): a special type of table A caseset is associated with a DMM and meta-info

while creating a DMM Save mining algorithm and resulting abstraction

instead of data itself Fundamental operations: CREATE, INSERT INTO,

PREDICTION JOIN, SELECT, DELETE FROM, and DROP

Page 10: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

10

Flatterned Representation of Caseset

Customers

Customer ID

Gender

Hair Color

Age

Age ProbCar

Owernership

Customer ID

Car

Car Prob

Product Purchases

Customer ID

Product Name

Quantity

Product Type

CID Gend Hair Age Age prob Prod Quan Type CarCar

prob

1 Male Black 35 100% TV 1 Elec Car 100%

1 Male Black 35 100% VCR 1 Elec Car 100%

1 Male Black 35 100% Ham 6 Food Car 100%

1 Male Black 35 100% TV 1 Elec Van 50%

1 Male Black 35 100% VCR 1 Elec Van 50%

1 Male Black 35 100% Ham 6 Food Van 50%

Problem: Lots of replication!

Page 11: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

11

Logical Nested Table Representation of Caseset

Use Data Shaping Service to generate a hierarchical rowset Part of Microsoft Data Access

Components (MDAC) products

CID Gend Hair Age Age prob

Product Purchases

Car Ownership

Prod Quan Type CarCar

prob

1 Male Black 35 100%

TV 1 Elec Car 100%

VCR 1 ElecVan 50%

Ham 6 Food

Page 12: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

12

More About Nested Table

Not necessary for the storage subsystem to support nested records

Cases are only instantiated as nested rowsets prior to training/predicting data mining models

Same physical data may be used to generate different casesets

Page 13: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

13

Defining A Data Mining Model

The name of the model

The algorithm and parameters

The columns of caseset and the

relationships among columns

“Source columns” and “prediction

columns”

Page 14: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

14

Example

CREATE MINING MODEL [Age Prediction] %Name of Model([Customer ID] LONG KEY, %source column[Gender] TEXT DISCRETE, %source column[Age] Double DISCRETIZED() PREDICT, %prediction column[Product Purchases] TABLE %source column([Product Name] TEXT KEY, %source column[Quantity] DOUBLE NORMAL CONTINUOUS, %source column[Product Type] TEXT DISCRETE RELATED TO [Product Name]

%source column))USING [Decision_Trees_101] %Mining algorithm used

Page 15: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

15

Column Specifiers

KEY ATTRIBUTE RELATION (RELATED TO clause) QUALIFIER (OF clause)

PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE

Page 16: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

16

Attribute Types

DISCRETE ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME

Page 17: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

17

Populating A DMM

Use INSERT INTO statement

Consuming a case using the data mining

model

Use SHAPE statement to create the

nested table from the input data

Page 18: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

18

Example: Populating a DMM

INSERT INTO [Age Prediction]([Customer ID], [Gender], [Age],[Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]))SHAPE{SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}APPEND{SELECT [CustID], {product Name], [Quantity], [Product Type] FROM SalesORDER BY [CustID]}RELATE [Customer ID] TO [CustID])AS [Product Purchases]

Page 19: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

19

Using Data Model to Predict

Prediction join Prediction on dataset D using DMM M Different to equi-join

DMM: a “truth table” SELECT statement associated with

PREDICTION JOIN specifies values extracted from DMM

Page 20: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

20

Example: Using a DMM in Prediction

SELECT t.[Customer ID], [Age Prediction].[Age]FROM [Age Prediction]PRECTION JOIN(SHAPE

{SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}APPEND({SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}RELATE [Customer ID] TO [CustID])AS [Product Purchases]

)AS tON [Age Prediction].[Gender]=t.[Gender] AND[Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND[Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]

Page 21: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

21

Browsing DMM

What is in a DMM?

Rules, formulas, trees, …, etc

Browsing DMM

Visualization

Page 22: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

22

Concluding Remarks

OLE DB for DM integrates data mining and database systems A good standard for mining application

builders How can we be involved?

Provide association/sequential pattern mining modules for OLE DB for DM?

Design more concrete language primitives? References

http://www.microsoft.com/data.oledb/dm.html

Page 23: Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

April 13, 2023 Data Mining: Concepts and Techniques

23

www.cs.uiuc.edu/~hanj

Thank you !!!Thank you !!!