Integration of data mining results into multi-dimensional data models
-
Upload
international-federation-for-information-technologies-in-travel-and-tourism-ifitt -
Category
Education
-
view
50 -
download
4
Transcript of Integration of data mining results into multi-dimensional data models
ENTER 2015 Research Track Slide Number 1
Volker Meyera
Wolfram Höpkena
Matthias Fuchsb
Maria Lexhagenb
a University of Applied Sciences Ravensburg-WeingartenWeingarten, Germany
{name.surname}@hs-weingarten.de
b Mid-Sweden UniversityÖstersund, Sweden
{name.surname}@miun.se
Integration of data mining results into multi-dimensional data models
ENTER 2015 Research Track Slide Number 2
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 3
Motivation
• Business intelligence and data mining in tourism
– Amount of available information dramatically increased
• e.g. web-servers store tourists’ website navigation, data bases save transaction and survey data, etc.
– Methods of BI and DM used to mine information about tourists’ travel motives, service expectations, channel use, conversion rates or booking trends (Pyo, et al., 2002; Wong, et al., 2006)
• Business-IT gap
– DM tools demand a huge knowledge about the DM process and the single techniques (e.g. decision trees, association rules)
– Results can be unintelligible without the right technical knowledge of how to read them
Crucial business relevant information is available, but the user who needs the information is not able to decode it
ENTER 2015 Research Track Slide Number 4
Objective
• Objective
– Present DM results in a way understandable and managable forbusiness users
• Approach
– Integrate knowledge generated by DM techniques directly into the data warehouse structures the underlying data are stemming from
– DM results (e.g. decision trees or association rules) available by well-established analysis techniques, like online analytic processing (OLAP)
Integration concepts and data warehouse structures are presented for major data mining techniques, like frequent itemsets, decision trees,and clustering
ENTER 2015 Research Track Slide Number 5
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 6
Integrating DM results into databases
• Extending the database standard SQL(by new data types and database operations)
– Inductive query language by SINDBAD project (Kramer, et al., 2006)
– Mining Association Rule Extension (Meo, et al., 1998)
– Mining Structured Query Language (MSQL) (Imielinski, 1999)
– Data Mining Query Language (DMQL) (Han, et al., 1996)
• Integrating DM results without extending database standard
– ADReM-Group (http://adrem.ua.ac.be/adrem) or Fromont et. al (2007)
standard conformance (standard tools and analysis approaches)
suitable for integrating DM results into existing data warehouse structures
ENTER 2015 Research Track Slide Number 7
Multi-dimensional data models (MDM)
• Fundamental concept of MDM
– Separation between
• Performance indicators (facts),e.g. turnover or number of persons
• Context/dimensions, e.g. time, date,customer, or product
– Typically represented as star schema
• MDM became famous for data warehousing
– Effective support of complex queries and OLAP analyses
– Better understandability for end users
– Crucial in tourism due to complex data structures (Höpken et al. 2013)
Booking
BookingNo (DD)Turnover (F)NoPersons (F)
DimProduct
ProdDesriptionProdCategory
DimCustomer
CusNameCusAgeCusGenderCusOrigin
DimTime
DayTimeMinutesHours
DimDate
DayInWeekWeekendWeekMonthYearSeason
ENTER 2015 Research Track Slide Number 8
Integrating DM results into MDM
• Extending the MDM by additional facts and dimensions/attributes
– Complexity strongly depends onconcrete DM model
– Cluster membership can just berepresented as an additional attribute
– Decision trees or association rules need amore complex fact/dimension structure
• Current status
– Simple approaches for market baskets (i.e. frequent itemsets) exist(Kimball & Ross, 2002)
– Comprehensive approach for all DM models still missing
Booking
BookingNo (DD)Turnover (F)NoPersons (F)
DimProduct
ProdDesriptionProdCategory
DimCustomer
CusNameCusAgeCusGenderCusOrigin
DimTime
DayTimeMinutesHours
DimDate
DayInWeekWeekendWeekMonthYearSeason
ENTER 2015 Research Track Slide Number 9
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 10
Frequent itemsets
• Frequent itemset = attribute values often co-occuring
• Approach to store frequent itemsets– Reuse original data structures
• Store co-occuring attribute values inartifical entries within original star schema
– Add frequent itemset tablereferencing to artificalentries in orginal starschema
ENTER 2015 Research Track Slide Number 11
Frequent itemsets
• Example: „old“ and „Swedisch“ customers
ENTER 2015 Research Track Slide Number 12
Frequent itemsets and OLAP analyses
• Overall revenue per frequent itemset
– Frequent itemsets used as new analysis dimension
• Identifying most valuable frequent itemsets(which is not possible in typical data mining tools)
ENTER 2015 Research Track Slide Number 13
Frequent itemsets and OLAP analyses
• Drill-through by frequent itemsets– Looking at single bookings belonging to
(i.e. supporting) a frequent itemset
– Example:Frequent itemset „old customersbooking a hotel“ with detailedinformation booking data, season,customer age, origin, sex andbooking price
ENTER 2015 Research Track Slide Number 14
Clustering
• Clustering = grouping similar records into homogeneous clusters
• Approach for storing clusters within a multi-dimensional structure
– Cluster centroids (i.e. calculated cluster centers) stored as artificial entries in original star schema
– Cluster table stores characteristicsof each cluster of a cluster modeland points to cluster centroid asartifical entry in star schema
– In the original fact table the cluster membership is stored for each original data entry (attribute FKCluster pointing to the cluster table)
ENTER 2015 Research Track Slide Number 15
Clustering
• Example: Customer clusters
ENTER 2015 Research Track Slide Number 16
Cluster models and OLAP analyses
• Revenue per customer cluster
– Clusters are used as newdimension for dataanalyses
– New characteristics ofclusters can becalculated, e.g. sum ofbooking price, groupedby any other dimensioncharacteristic e.g. season
ENTER 2015 Research Track Slide Number 17
Decision trees
• Decision tree
– Separating data records into predefined classes based on a series of decisions
ENTER 2015 Research Track Slide Number 18
Decision trees
• Storing a decision tree in a multi-dimensional structure
– Each node is represented by a decision rule
• booking = short-term -> valuable = yes
• booking = long-term & type appartment = yes -> valuable = yes
– Decision rules stored by
• Reusing original star schema to specify attribute values of the rule
• Specific table specifying rule characteristics and referencing to artifical entry in original structure
ENTER 2015 Research Track Slide Number 19
Decision trees
• Example decision rulebooking = long-term & type appartment = yes -> valuable = yes
ENTER 2015 Research Track Slide Number 20
Decision trees and OLAP analyses
– Decision tree nodesare used as newdimension for dataanalyses
– New characteristicsof decision treenodes can becalculated, e.g. sumof booking price(based on any factof the fact table)
ENTER 2015 Research Track Slide Number 21
Decision trees and OLAP analyses
– Decision treesare used tonarrow down the analysis tointerestingsubgroups (i.e. nodes with a high accuracy)
ENTER 2015 Research Track Slide Number 22
Benefit of presented approach
• Advantages of integrating data mining results intooriginal multi-dimensional data structure
– Ordinary OLAP queries can be used to analyse data miningresults (like frequent itemsets, cluster models or decisiontrees)
– Data mining results complement existing information andenhance explanation power of analyses by constituting a new dimension
• E.g. calculate overall turnover of frequent itemsets, decision treenodes or clusters
• E.g. filter bookings by a specific frequent itemset (only looking at bookings from old and Swedish customers)
ENTER 2015 Research Track Slide Number 23
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 24
Conclusion & Outlook
• BI & data mining in tourism– Multi-dimensional data warehouse structures important concept for
tourism (destinations)
– All data mining techniques heavily used in tourism
• Novel approach for integrating data mining models intounderlying multi-dimensional data structures– Frequent itemsets, association rules, clustering, decision trees
– Complement existing information and enrich OLAP analyses
• Future activities– Automatic transformation of data mining results into multi-
dimensional structures to support broader evaluation
– Evaluate user acceptance of new analysis possibilities