Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system...

21
Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on- line advertisments Tomáš Kliegr 7.3.2007 A case study approach

Transcript of Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system...

Page 1: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

Clickstream analysis - data collection, preprocessing and mining using

the LISp-Miner system

Effective placement of on-line advertisments

Tomáš Kliegr7.3.2007

A case study approach

Page 2: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

2

Methodology – CRISP DM

Page 3: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

3

I. Data collection

• Data are collected on the server application layer

• No demands on the tracked website

Page 4: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

4

Comparison with log-file based approaches

• Works with all browsers with enabled cookies

• Automatic robot filtering

• Storage efficiency

• Easy to integrate & safe to operate

Page 5: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

5

II. Data preprocessing

Problem: collected click streams have varying lengths.

Goal: create higher-level abstraction of the visitorThis phase creates a fixed-length visitor’s profile in

a two step processSegment procedure: classifies pages into a

domain specific taxonomy on several levels of granularity.

Merge procedure: extracts important and characteristic information from visitor’s clickstream.

Page 6: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

6

Assigning pages to categories

Prespecified taxonomy(tuples ProductID - category,

Tuples URL pattern – category)

SQL Server SPSegment

Pages classified on several levels of granularity

Visited pages

(UR addresses

Stored in a database)

Page 7: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

7

Segment procedure

• Classifies pages into a domain specific taxonomy on several levels of granularity.

• Assigns Time on page and Score to each page in visitor’s clickstream

• Score expresses absolute weight of a particular page in user’s click stream.

S = (ln(o) + 1)* to – order of a page in users clickstreamt – time on page

Page 8: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

8

Segment – Example output

Page

www.poznani.cz/hiking-alps/

General category (Cat)

Search

Extended Category (ECat)

Catalogue

Topic

Alps

Page 9: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

9

Merge procedure

This procedure creates the visitor profile:• Basic attributes (6): Total time on web, Number of

displayed pages, Day of week, Hour of day, Referring domain (constituted by URL and Cat attributes).

• Important points on the path (12): Entry page, Exit page, Conversion page. (Page name, Cat, ECat and S).

• Attributes conceptualizing the path (11): Range of interest, Most favourite topic (Topic, S), Search total (S) and Search analytically (Fulltext (S), Extended search (S),Catalogue Search (S)), General information pages total (S) and analytically (Discounts(S), Insurance (S), About (S)).

Page 10: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

10

Merge – example output

Page 11: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

11

III. Datamining

• Association Rules are the most frequently used approach [Facci, Lanza]

• LISp-Miner system - 4ft-Miner, SD4ft-Miner

• Sample task:From which referring class of websites do most converted visitors come?

Page 12: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

12

Choosing the right quantifier

• LISp-Miner offers a range of quantifiers• Founded implication

– Support a, a/(a+b+c+d)– Confidence a/(a+b)– Problem: tight dependencies rarely found and rarely

required in clickstream data

• Above average quantifier“Among objects satisfying Ant there are at least 100*p

per cent more objects satisfying Suc then there are objects satisfying Suc in the whole data matrix.” LISp-Miner Help

Page 13: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

13

Ilustrace

Ant/Suc Conversion Not(Conversion)

Partner webs 7 63

Not (PW) 7 693

Confidence threshold max.<= 7/(63+7) <= 0.1

AAI threshold<= 0.1/0.018 <= 5.555

[% of objects satisfying Suc and Ant] = 7/ 70 = 0.1

[% of objects satisfying Suc in the entire data matrix] = 14/ 770 = 0.018

LISP-Miner demonstration

Page 14: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

14

SD4ft-Miner

• Mines for patterns of the form /(,,)• This SD4ft-Pattern means that the subsets given

by Boolean attributes , differ in what concerns the relation of Boolean attributes , when condition is satisfied.

• What groups of customers , (i.e. depending on where they come from) under what condition remarkably differ when it comes to the probability of conversion.

• We express “the conversion condition” by setting only the succedent () and we leave the antecedent unset.

Page 15: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

15

Page 16: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

16

4ft Miner vs SD4ft

4ft-Miner, Above Average Quant.

SD4ft-Miner, (neg. gace type for 2nd subset)

The value of increase in the conversion rate is more suitable for our purposes as the 2nd set is disjunct with the 1st set. The conversion rate for partner webs is 78% higher than is the average for other referrers

Con1/Conf2= 0,132/0,074 = 1,784

Page 17: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

17

Solution to Task 1

From which referring class of websites do most converted visitors come?

Conversion rate

0

0,05

0,1

0,15Fulltexts

Catalogues

No referer

Other

Partner webs

Own webs

Conversion rate

Page 18: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

18

SD4Ft – cont.

• If the output is sorted according to Difference of values of confidence

• The first rule says: Conversion rate for visitors coming from

partner websites is 13.2%, while conversion rate for visitors coming from company’s own websites is only 4.9%.

Page 19: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

19

Review

• The goal of the second run of the CRISP-DM Cycle is to:

• Extend available info - log user actions

• Improve the heuristics for the Most favourite topic

• Involve page texts

• New development platform – Ferda boxes

Page 20: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

20

Page 21: Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

21

References

• Rauch, J., Šimůnek, M.: An Alternative Approach to Mining Association Rules. In: Foundations of Data Mining and Knowledge Discovery. Berlin 2005

• Rauch, J., et al: Mining for Patterns Based on Contingency Tables by KL-Miner - First Experience. In: Foundations and Novel Approaches in Data Mining. Berlin: Springer, 2005

• Strossa, P., et al: Reporting Data Mining Results In a Natural Language. In: dtto

• Kováč, M., et al: Ferda, New Visual Environment for Data Mining. Znalosti 2006

• LM Report Asistent. Znalosti 2007• Lispminer.vse.cz, ferda.sourceforge.net/