Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours...
-
Upload
preston-wilkerson -
Category
Documents
-
view
216 -
download
0
Transcript of Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours...
Transforming and leveraging OLAP queries
Patrick MarcelUniversité François Rabelais Tours
Laboratoire d'Informatique
SAP-BO, 06.22.2010
2
Outline
Short CV Personnalizing OLAP queries Recommending OLAP queries Summarizing OLAP queries Perspectives
3
About me
PhD « multidimensional data(base) manipulations and rule based languages », defended 1998, LISI (now LIRIS) INSA Lyon Sup. J. Kouloumdjian and MS Hacid
Maître de Conférences, UFRT, Dépt. Informatique Head of the Masters program in Information
systems and decision making Semester off (September 2010 – January 2011)
4
About me (cont'd)
Member of DB & NLP team (4 PR, 8 MCF) NLP XML and web technology Data mining and OLAP Recent activities
• Pattern based global models (PhD Eynollah Khanjari 2009)
• Summarizing and visualizing large sets of association rules (PhD Marie Ndiaye 2010)
• Collaborative exploration of datawarehouses (PhD Elsa Negre 2009)
5
Personnalizing OLAP queries
PhD Hassina Mouloudi (2007) Main pulications
ACM DOLAP 2005 BDA 2006 Hassina's dissertation (in French)
Prototype Mobile application for querying a cube with query
personnalization Mondrian, Oracle, Tomcat, Axis
6
Motivation
SELECT CROSSJOIN({City.Tours, City.Orleans},{Category.Members}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS
FROM SalesCubeWHERE (Measures.quantity)
Visualization depends on the user's profile
2003 2004 2005 2006Tours Drink 77 54 55 33
Food 89 61 30 41Orleans Drink 25 50 49 32
Food 33 44 59 27
Tours 2003 2004 2005 2005Food 77 54 55 33Drink 89 61 30 41Cloth 56 30 32 60Shoes 45 50 32 51
7
The problem
• Given
– An MDX query q
– User preferences P
– A Visualization constraint v
• Find a preferred query q'
– Included in q
– Nearest to q satisfying v
– The most interesting w.r.t P
8
Example of preferred query
SELECT CROSSJOIN({City.Tours},
{Category.Food,Category.Drink}) ON ROWS
{Year.2005} ON COLUMNSFROM SalesCubeWHERE (Measures.quantity)
SELECT CROSSJOIN({City.Tours},{Year.2006}) ON ROWS{Category.Drink} ON
COLUMNSFROM SalesCubeWHERE (Measures.quantity)
<
Since the user profile contains
Location < Product, Product < Time2005 < 2006, food < drink
Indeed:
(2005,Food,Tours,quantity) < (2006,Drink,Tours,quantity)(2005,Drink,Tours,quantity) < (2006,Drink,Tours, quantity)
9
Personnalizing
User query
Result
User profilDimension tables
Fact table
QueryprocessorPersonnalization engine
10
Personnalizing OLAP queries
• Context
– Dimension tables in main memory
– No acces to the fact table
• Principle
– Compute sets of positions in the resulting crosstab• Largest possible
• Visualizable w.r.t. The visualization constraint
• Corresponding to the preferred facts
– Compute the structures of the crosstabs
11
Example of personnalization (1)
The query:SELECT CROSSJOIN({City.Tours, City.Orleans},
{Category.Members}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS
FROM SalesCubeWHERE (Measures.quantity)
Preferences:Time < Location and Product < Location2002 < 2003 < 2004 < 2005 < 2006Electronics < shoes < cloth < food < drinkQuantity < price
Constraint: 2 axes, no more than 4 positions on each axis
12
Example of personnalization (2)
2006Drink Orleans
Tours
Step 1The most preferred facts
13
Example of personnalization (3)
2006Drink Orleans
Tours
2006 2005Drink Orleans
ToursFood Orleans
Tours
Step 2The second most preferred facts
14
Example of personnalization (4)
2006Drink Orleans
Tours
2006 2005Drink Orleans
ToursFood Orleans
Tours
2006 2005 2004Drink Orleans
ToursFood Orleans
Tours
Drink Food ClothTours 2005
2006Orleans 2005
2006
Step 3: the next most preferred factsBut the selected facts have to satisfy the visualization constraint
15
Example of personnalization (5)
Finally, one of the constructed query is
SELECT CROSSJOIN({City.Tours, City.Orleans},{Category.Food, Category.drink}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS
FROM SalesCubeWHERE (Measures.quantity)
2003 2004 2005 2006Tours Drink 77 54 55 33
Food 89 61 30 41Orleans Drink 25 50 49 32
Food 33 44 59 27
16
Prototype
17
Speedup
18
Recommending OLAP queries
PhD Elsa Negre (2009) Main publications
ACM DOLAP 2008 DaWak 2009 ACM DOLAP 2009 Int. Journal of DW and mining
Prototype Various methods for OLAP query recommendation
Mondrian, MySql
19
Context and principle
20
Distances
• Between positions in the cube
– Hamming
– Based on shortest path
• Between queries
– Based on differences in dimension
– Hausdorff
• Between sessions
– Based on the subsequence
– Edit distance
21
Experiments
• Cube
– Foodmart (Mondrian sample cube)
• Session generator
– Max 100 cells per MDX query
– 25-50 sessions
– 20-50 queries/session
– Log of 150-25000 queries
– 1-20 queries/current session
22
Efficiency
• Shortest path
• Hausdorff distance
• Edit distance
23
Effectiveness
• 10 fold cross validation
– 1 query set = 10 equally sized subsets• 9 for the log
• 1 for the current sessions
• For the current sessions
– Remove the last query
– check how often this last query is recommended
24
Effectiveness
E= Members of the expected query
R = Members of the recommended query
Precision = Intersect / RRecal = intersect / EFmeasure = 2 * precision * recall / precision + recall
Intersect
25
Query recommandation for discovery driven analysis? Hm this looks
strange to me...
interesting...
26
Processing the log1: Consider all sessions
27
Processing the log
2: consider all queries
1: Consider all sessions
28
Processing the log
2: consider all queries
1: Consider all sessions
3: consider all difference pairs
29
Processing the log
2: consider all queries
1: Consider all sessions
3: consider all difference pairs
4: detect theirdrilldown pairs
30
Processing the log
2: consider all queries
1: Consider all sessions
3: consider all difference pairs
4: detect theirdrilldown pairs
5: detect theirexception pairs
31
Processing the log
2: consider all queries
1: Consider all sessions
3: consider all difference pairs
4: detect theirdrilldown pairs
6: consider only the most general pairshaving drilldown pairs
or exceptions pairs
5: detect theirexception pairs
32
Recommending 1: detectdifference pairs
33
Recommending
2: specialize a mostgeneral pair in the log?
1: detectdifference pairs
34
Recommending
2: specialize a mostgeneral pair in the log?
1: detectdifference pairs
3: suggest the mostgeneral queries...
35
Recommending
2: specialize a mostgeneral pair in the log?
1: detectdifference pairs
3: suggest the mostgeneral queries...
4: ... thendrilldown queries
36
Recommending
2: specialize a mostgeneral pair in the log?
1: detectdifference pairs
3: suggest the mostgeneral queries...
5: ... thenexception queries
4: ... thendrilldown queries
37
Prototype
Java, mondrian OLAP engine & Sarawagi's icube
Preliminary tests show that for small size log (few hundreds of queries) Recommendation time does not exceeds 50 ms
38
Conclusion: so far...Hm this looks
strange to me...
Ongoing work with IRSA (a French social security health examination center)
to analyze over 500.000 health care examination questionnaires
39
Summarizing OLAP queries
Master's thesis Julien Aligon (in progress) Problem: viewpoints on former sessions?
– By summarizing the log• Summarize a sequence of queries by a sequence
of queries
– By browsing/querying the summary Experiments on healthcare data Related publication
– EDA 2007, 2010
40
Perspectives
Project STIC-AmSud PQUERY: preference models for personnalized queries
Forthcomming work with M. Golfarelli (U. Bologna)
– Preference mining to dynamically add preferences to an MDX query
Contributions to a collaborative query management system for OLAP