Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours...

40
Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010

Transcript of Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours...

Page 1: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

Transforming and leveraging OLAP queries

Patrick MarcelUniversité François Rabelais Tours

Laboratoire d'Informatique

SAP-BO, 06.22.2010

Page 2: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

2

Outline

Short CV Personnalizing OLAP queries Recommending OLAP queries Summarizing OLAP queries Perspectives

Page 3: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

3

About me

PhD « multidimensional data(base) manipulations and rule based languages », defended 1998, LISI (now LIRIS) INSA Lyon Sup. J. Kouloumdjian and MS Hacid

Maître de Conférences, UFRT, Dépt. Informatique Head of the Masters program in Information

systems and decision making Semester off (September 2010 – January 2011)

Page 4: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

4

About me (cont'd)

Member of DB & NLP team (4 PR, 8 MCF) NLP XML and web technology Data mining and OLAP Recent activities

• Pattern based global models (PhD Eynollah Khanjari 2009)

• Summarizing and visualizing large sets of association rules (PhD Marie Ndiaye 2010)

• Collaborative exploration of datawarehouses (PhD Elsa Negre 2009)

Page 5: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

5

Personnalizing OLAP queries

PhD Hassina Mouloudi (2007) Main pulications

ACM DOLAP 2005 BDA 2006 Hassina's dissertation (in French)

Prototype Mobile application for querying a cube with query

personnalization Mondrian, Oracle, Tomcat, Axis

Page 6: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

6

Motivation

SELECT CROSSJOIN({City.Tours, City.Orleans},{Category.Members}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS

FROM SalesCubeWHERE (Measures.quantity)

Visualization depends on the user's profile

2003 2004 2005 2006Tours Drink 77 54 55 33

Food 89 61 30 41Orleans Drink 25 50 49 32

Food 33 44 59 27

Tours 2003 2004 2005 2005Food 77 54 55 33Drink 89 61 30 41Cloth 56 30 32 60Shoes 45 50 32 51

Page 7: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

7

The problem

• Given

– An MDX query q

– User preferences P

– A Visualization constraint v

• Find a preferred query q'

– Included in q

– Nearest to q satisfying v

– The most interesting w.r.t P

Page 8: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

8

Example of preferred query

SELECT CROSSJOIN({City.Tours},

{Category.Food,Category.Drink}) ON ROWS

{Year.2005} ON COLUMNSFROM SalesCubeWHERE (Measures.quantity)

SELECT CROSSJOIN({City.Tours},{Year.2006}) ON ROWS{Category.Drink} ON

COLUMNSFROM SalesCubeWHERE (Measures.quantity)

<

Since the user profile contains

Location < Product, Product < Time2005 < 2006, food < drink

Indeed:

(2005,Food,Tours,quantity) < (2006,Drink,Tours,quantity)(2005,Drink,Tours,quantity) < (2006,Drink,Tours, quantity)

Page 9: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

9

Personnalizing

User query

Result

User profilDimension tables

Fact table

QueryprocessorPersonnalization engine

Page 10: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

10

Personnalizing OLAP queries

• Context

– Dimension tables in main memory

– No acces to the fact table

• Principle

– Compute sets of positions in the resulting crosstab• Largest possible

• Visualizable w.r.t. The visualization constraint

• Corresponding to the preferred facts

– Compute the structures of the crosstabs

Page 11: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

11

Example of personnalization (1)

The query:SELECT CROSSJOIN({City.Tours, City.Orleans},

{Category.Members}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS

FROM SalesCubeWHERE (Measures.quantity)

Preferences:Time < Location and Product < Location2002 < 2003 < 2004 < 2005 < 2006Electronics < shoes < cloth < food < drinkQuantity < price

Constraint: 2 axes, no more than 4 positions on each axis

Page 12: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

12

Example of personnalization (2)

2006Drink Orleans

Tours

Step 1The most preferred facts

Page 13: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

13

Example of personnalization (3)

2006Drink Orleans

Tours

2006 2005Drink Orleans

ToursFood Orleans

Tours

Step 2The second most preferred facts

Page 14: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

14

Example of personnalization (4)

2006Drink Orleans

Tours

2006 2005Drink Orleans

ToursFood Orleans

Tours

2006 2005 2004Drink Orleans

ToursFood Orleans

Tours

Drink Food ClothTours 2005

2006Orleans 2005

2006

Step 3: the next most preferred factsBut the selected facts have to satisfy the visualization constraint

Page 15: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

15

Example of personnalization (5)

Finally, one of the constructed query is

SELECT CROSSJOIN({City.Tours, City.Orleans},{Category.Food, Category.drink}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS

FROM SalesCubeWHERE (Measures.quantity)

2003 2004 2005 2006Tours Drink 77 54 55 33

Food 89 61 30 41Orleans Drink 25 50 49 32

Food 33 44 59 27

Page 16: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

16

Prototype

Page 17: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

17

Speedup

Page 18: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

18

Recommending OLAP queries

PhD Elsa Negre (2009) Main publications

ACM DOLAP 2008 DaWak 2009 ACM DOLAP 2009 Int. Journal of DW and mining

Prototype Various methods for OLAP query recommendation

Mondrian, MySql

Page 19: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

19

Context and principle

Page 20: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

20

Distances

• Between positions in the cube

– Hamming

– Based on shortest path

• Between queries

– Based on differences in dimension

– Hausdorff

• Between sessions

– Based on the subsequence

– Edit distance

Page 21: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

21

Experiments

• Cube

– Foodmart (Mondrian sample cube)

• Session generator

– Max 100 cells per MDX query

– 25-50 sessions

– 20-50 queries/session

– Log of 150-25000 queries

– 1-20 queries/current session

Page 22: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

22

Efficiency

• Shortest path

• Hausdorff distance

• Edit distance

Page 23: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

23

Effectiveness

• 10 fold cross validation

– 1 query set = 10 equally sized subsets• 9 for the log

• 1 for the current sessions

• For the current sessions

– Remove the last query

– check how often this last query is recommended

Page 24: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

24

Effectiveness

E= Members of the expected query

R = Members of the recommended query

Precision = Intersect / RRecal = intersect / EFmeasure = 2 * precision * recall / precision + recall

Intersect

Page 25: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

25

Query recommandation for discovery driven analysis? Hm this looks

strange to me...

interesting...

Page 26: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

26

Processing the log1: Consider all sessions

Page 27: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

27

Processing the log

2: consider all queries

1: Consider all sessions

Page 28: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

28

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

Page 29: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

29

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

4: detect theirdrilldown pairs

Page 30: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

30

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

4: detect theirdrilldown pairs

5: detect theirexception pairs

Page 31: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

31

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

4: detect theirdrilldown pairs

6: consider only the most general pairshaving drilldown pairs

or exceptions pairs

5: detect theirexception pairs

Page 32: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

32

Recommending 1: detectdifference pairs

Page 33: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

33

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

Page 34: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

34

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

3: suggest the mostgeneral queries...

Page 35: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

35

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

3: suggest the mostgeneral queries...

4: ... thendrilldown queries

Page 36: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

36

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

3: suggest the mostgeneral queries...

5: ... thenexception queries

4: ... thendrilldown queries

Page 37: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

37

Prototype

Java, mondrian OLAP engine & Sarawagi's icube

Preliminary tests show that for small size log (few hundreds of queries) Recommendation time does not exceeds 50 ms

Page 38: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

38

Conclusion: so far...Hm this looks

strange to me...

Ongoing work with IRSA (a French social security health examination center)

to analyze over 500.000 health care examination questionnaires

Page 39: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

39

Summarizing OLAP queries

Master's thesis Julien Aligon (in progress) Problem: viewpoints on former sessions?

– By summarizing the log• Summarize a sequence of queries by a sequence

of queries

– By browsing/querying the summary Experiments on healthcare data Related publication

– EDA 2007, 2010

Page 40: Transforming and leveraging OLAP queries Patrick Marcel Université François Rabelais Tours Laboratoire d'Informatique SAP-BO, 06.22.2010.

40

Perspectives

Project STIC-AmSud PQUERY: preference models for personnalized queries

Forthcomming work with M. Golfarelli (U. Bologna)

– Preference mining to dynamically add preferences to an MDX query

Contributions to a collaborative query management system for OLAP