Integration of data mining results into multi-dimensional data models

24
ENTER 2015 Research Track Slide Number 1 Volker Meyer a Wolfram Höpken a Matthias Fuchs b Maria Lexhagen b a University of Applied Sciences Ravensburg-Weingarten Weingarten, Germany {name.surname}@hs-weingarten.de b Mid-Sweden University Östersund, Sweden {name.surname}@miun.se Integration of data mining results into multi-dimensional data models

Transcript of Integration of data mining results into multi-dimensional data models

Page 1: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 1

Volker Meyera

Wolfram Höpkena

Matthias Fuchsb

Maria Lexhagenb

a University of Applied Sciences Ravensburg-WeingartenWeingarten, Germany

{name.surname}@hs-weingarten.de

b Mid-Sweden UniversityÖstersund, Sweden

{name.surname}@miun.se

Integration of data mining results into multi-dimensional data models

Page 2: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 2

Content

• Introduction

• State of the art

• Concepts for integrating DM results into MDM

• Conclusion

Page 3: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 3

Motivation

• Business intelligence and data mining in tourism

– Amount of available information dramatically increased

• e.g. web-servers store tourists’ website navigation, data bases save transaction and survey data, etc.

– Methods of BI and DM used to mine information about tourists’ travel motives, service expectations, channel use, conversion rates or booking trends (Pyo, et al., 2002; Wong, et al., 2006)

• Business-IT gap

– DM tools demand a huge knowledge about the DM process and the single techniques (e.g. decision trees, association rules)

– Results can be unintelligible without the right technical knowledge of how to read them

Crucial business relevant information is available, but the user who needs the information is not able to decode it

Page 4: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 4

Objective

• Objective

– Present DM results in a way understandable and managable forbusiness users

• Approach

– Integrate knowledge generated by DM techniques directly into the data warehouse structures the underlying data are stemming from

– DM results (e.g. decision trees or association rules) available by well-established analysis techniques, like online analytic processing (OLAP)

Integration concepts and data warehouse structures are presented for major data mining techniques, like frequent itemsets, decision trees,and clustering

Page 5: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 5

Content

• Introduction

• State of the art

• Concepts for integrating DM results into MDM

• Conclusion

Page 6: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 6

Integrating DM results into databases

• Extending the database standard SQL(by new data types and database operations)

– Inductive query language by SINDBAD project (Kramer, et al., 2006)

– Mining Association Rule Extension (Meo, et al., 1998)

– Mining Structured Query Language (MSQL) (Imielinski, 1999)

– Data Mining Query Language (DMQL) (Han, et al., 1996)

• Integrating DM results without extending database standard

– ADReM-Group (http://adrem.ua.ac.be/adrem) or Fromont et. al (2007)

standard conformance (standard tools and analysis approaches)

suitable for integrating DM results into existing data warehouse structures

Page 7: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 7

Multi-dimensional data models (MDM)

• Fundamental concept of MDM

– Separation between

• Performance indicators (facts),e.g. turnover or number of persons

• Context/dimensions, e.g. time, date,customer, or product

– Typically represented as star schema

• MDM became famous for data warehousing

– Effective support of complex queries and OLAP analyses

– Better understandability for end users

– Crucial in tourism due to complex data structures (Höpken et al. 2013)

Booking

BookingNo (DD)Turnover (F)NoPersons (F)

DimProduct

ProdDesriptionProdCategory

DimCustomer

CusNameCusAgeCusGenderCusOrigin

DimTime

DayTimeMinutesHours

DimDate

DayInWeekWeekendWeekMonthYearSeason

Page 8: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 8

Integrating DM results into MDM

• Extending the MDM by additional facts and dimensions/attributes

– Complexity strongly depends onconcrete DM model

– Cluster membership can just berepresented as an additional attribute

– Decision trees or association rules need amore complex fact/dimension structure

• Current status

– Simple approaches for market baskets (i.e. frequent itemsets) exist(Kimball & Ross, 2002)

– Comprehensive approach for all DM models still missing

Booking

BookingNo (DD)Turnover (F)NoPersons (F)

DimProduct

ProdDesriptionProdCategory

DimCustomer

CusNameCusAgeCusGenderCusOrigin

DimTime

DayTimeMinutesHours

DimDate

DayInWeekWeekendWeekMonthYearSeason

Page 9: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 9

Content

• Introduction

• State of the art

• Concepts for integrating DM results into MDM

• Conclusion

Page 10: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 10

Frequent itemsets

• Frequent itemset = attribute values often co-occuring

• Approach to store frequent itemsets– Reuse original data structures

• Store co-occuring attribute values inartifical entries within original star schema

– Add frequent itemset tablereferencing to artificalentries in orginal starschema

Page 11: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 11

Frequent itemsets

• Example: „old“ and „Swedisch“ customers

Page 12: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 12

Frequent itemsets and OLAP analyses

• Overall revenue per frequent itemset

– Frequent itemsets used as new analysis dimension

• Identifying most valuable frequent itemsets(which is not possible in typical data mining tools)

Page 13: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 13

Frequent itemsets and OLAP analyses

• Drill-through by frequent itemsets– Looking at single bookings belonging to

(i.e. supporting) a frequent itemset

– Example:Frequent itemset „old customersbooking a hotel“ with detailedinformation booking data, season,customer age, origin, sex andbooking price

Page 14: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 14

Clustering

• Clustering = grouping similar records into homogeneous clusters

• Approach for storing clusters within a multi-dimensional structure

– Cluster centroids (i.e. calculated cluster centers) stored as artificial entries in original star schema

– Cluster table stores characteristicsof each cluster of a cluster modeland points to cluster centroid asartifical entry in star schema

– In the original fact table the cluster membership is stored for each original data entry (attribute FKCluster pointing to the cluster table)

Page 15: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 15

Clustering

• Example: Customer clusters

Page 16: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 16

Cluster models and OLAP analyses

• Revenue per customer cluster

– Clusters are used as newdimension for dataanalyses

– New characteristics ofclusters can becalculated, e.g. sum ofbooking price, groupedby any other dimensioncharacteristic e.g. season

Page 17: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 17

Decision trees

• Decision tree

– Separating data records into predefined classes based on a series of decisions

Page 18: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 18

Decision trees

• Storing a decision tree in a multi-dimensional structure

– Each node is represented by a decision rule

• booking = short-term -> valuable = yes

• booking = long-term & type appartment = yes -> valuable = yes

– Decision rules stored by

• Reusing original star schema to specify attribute values of the rule

• Specific table specifying rule characteristics and referencing to artifical entry in original structure

Page 19: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 19

Decision trees

• Example decision rulebooking = long-term & type appartment = yes -> valuable = yes

Page 20: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 20

Decision trees and OLAP analyses

– Decision tree nodesare used as newdimension for dataanalyses

– New characteristicsof decision treenodes can becalculated, e.g. sumof booking price(based on any factof the fact table)

Page 21: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 21

Decision trees and OLAP analyses

– Decision treesare used tonarrow down the analysis tointerestingsubgroups (i.e. nodes with a high accuracy)

Page 22: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 22

Benefit of presented approach

• Advantages of integrating data mining results intooriginal multi-dimensional data structure

– Ordinary OLAP queries can be used to analyse data miningresults (like frequent itemsets, cluster models or decisiontrees)

– Data mining results complement existing information andenhance explanation power of analyses by constituting a new dimension

• E.g. calculate overall turnover of frequent itemsets, decision treenodes or clusters

• E.g. filter bookings by a specific frequent itemset (only looking at bookings from old and Swedish customers)

Page 23: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 23

Content

• Introduction

• State of the art

• Concepts for integrating DM results into MDM

• Conclusion

Page 24: Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 24

Conclusion & Outlook

• BI & data mining in tourism– Multi-dimensional data warehouse structures important concept for

tourism (destinations)

– All data mining techniques heavily used in tourism

• Novel approach for integrating data mining models intounderlying multi-dimensional data structures– Frequent itemsets, association rules, clustering, decision trees

– Complement existing information and enrich OLAP analyses

• Future activities– Automatic transformation of data mining results into multi-

dimensional structures to support broader evaluation

– Evaluate user acceptance of new analysis possibilities