Kdd for personalization

82
Bamshad Mobasher - DePaul University, Chicago Bettina Berendt - Humboldt University Berlin Myra Spiliopoulou - Leipzig Graduate School of Management KDD for Personalization PKDD 2001 Tutorial September 6, 2001 KDD for Personalization PKDD 2001 Tutorial September 6, 2001 PKDD 2001 Tutorial: “KDD for Personalization” Web Personalization The Problem dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests Personalization v. Customization In customization, user controls and customizes the site or the product based on his/her preferences usually manual, but sometimes semi-automatic based on a given user profile Personalization is done automatically based on the user’s actions, the user’s profile, and (possibly) the profiles of others with “similar” profiles [I-2]

Transcript of Kdd for personalization

Bamshad Mobasher - DePaul University, Chicago

Bettina Berendt - Humboldt University Berlin

Myra Spiliopoulou - Leipzig Graduate School of Management

KDD for PersonalizationPKDD 2001 Tutorial

September 6, 2001

KDD for PersonalizationPKDD 2001 Tutorial

September 6, 2001

[2]PKDD 2001 Tutorial: “KDD for Personalization”

Web Personalization

• The Problem

– dynamically serve customized content (pages, products,recommendations, etc.) to users based on their profiles,preferences, or expected interests

• Personalization v. Customization

– In customization, user controls and customizes the siteor the product based on his/her preferences

– usually manual, but sometimes semi-automatic based ona given user profile

– Personalization is done automatically based on theuser’s actions, the user’s profile, and (possibly) theprofiles of others with “similar” profiles

[I-2]

[3]PKDD 2001 Tutorial: “KDD for Personalization”

my.yahoo.commy.yahoo.com

Customization Example

[I-3]

[4]PKDD 2001 Tutorial: “KDD for Personalization”

amazon.comamazon.com

Personalization Example

[I-4]

systemrecommends

A simplified scheme for personalization

PKDD 2001 Tutorial: "KDD for Personalization"

navigation histories- co-occurrence in user´s other navigation histories

selectswhat kind?- document etc.- query

user

other information object(s)

information object(s)

- co-occurrence in other users´- similarity (syntactic/semantic)why?

how?- request, specification- rating related to

[I-5]

Know Thy Customer Knowledge is Power

�Relationships based on customer insight propel an organization from

simply treating customers e�ciently to treating them relative to their

needs, preferences, and value potential. . . .

�Knowing the customer is paramount in today's marketplace where the

customer has more options, greater �exibility and higher expectations.

. . . �

John C. Nash (Accenture) in [46]

PKDD 2001 Tutorial: �KDD for Personalization� [I-6]

Customer knowledge implies:

1.) Acquisition of customer data

2.) Analysis of customer data

3.) Action in accordance with the gained insights

PKDD 2001 Tutorial: �KDD for Personalization� [I-7]

Acquisition of customer data

Customer data are recordings of:

� preferences

� transactions

� pre-sales contacts

� after-sales support

� demographic information

Some of these data:

� may be purchased from third parties

� may be held in multiple disparate databases that serve completely

di�erent purposes

� are of varying quality

with respect to error rates, reliability, coverage, representativeness

�! Data Preparation

PKDD 2001 Tutorial: �KDD for Personalization� [I-8]

Analysis of customer data

Data analysis should provide feedback on questions like

� Which users will become customers?

� Which customers will return again?

� Who is more likely to respond to a promotion action?

� Who would be interested in cross-sale/up-sale suggestions?

closely related to questions like

� Is the Web-site appropriately designed to serve the organisation's

goals?

� Are the customers satis�ed?

� Are the customers satis�ed enough to come again?

� Are the customers satis�ed enough to become promoters of the site?

�! Data Mining

PKDD 2001 Tutorial: �KDD for Personalization� [I-9]

Action in accordance with the gained insights

� Alignment of the marketing policy

� Alignment of the supply chain, including after sales support

� Adjustment of the web site

� static site re-design

� Browsing/Navigation suggestions

� Recommendations on the page

� Intelligent assistance

� Personalized layout and content

Fact: The time lag between insight and action should be minimized.

PKDD 2001 Tutorial: �KDD for Personalization� [I-10]

The action should create value

� for the customer

� for the organisation

PKDD 2001 Tutorial: �KDD for Personalization� [I-11]

A short excursion on value creation

In B2C e-commerce, is not su�cient to:

� o�er an existing product through the Internet

� digitize part/all of the merchandizing chain

� introduce a brilliant new product in the market

The product must bring added value to

� win the customer Customer Conversion

� retain the customer Customer Retention

PKDD 2001 Tutorial: �KDD for Personalization� [1-12]

The model of Kuhlen considers the following types of value [32]:

(1) comparative

(2) improving e�ciency

(3) improving e�ectivity

(4) integrative

(5) organisational

(6) strategic

(7) innovative

PKDD 2001 Tutorial: �KDD for Personalization� [1-13]

From Acquisition to Action

� There is no lack of data.

� Clickstream data accumulate in tremendous pace.

� Demographic data can be acquired.

� Customer pro�les are available or can be acquired.

� There is no lack of methodologies for data analysis.

� The ability to exploit the data increases at a much slower pace [46]

and the number of personalized Web sites is not really large.

� The tolerable elapsed time between acquisition and action is low

[16].

PKDD 2001 Tutorial: �KDD for Personalization� [I-14]

PKDD 2001 Tutorial: "KDD for Personalization"

Personalization: An HCI perspective

= does personalization increase usability?

A Web site’s usability is high if users

- experience high subjective satisfaction.

- achieve their goals / perform their tasks in little time,- do so with a low error rate,

- experts and "normal" users- questionnaires and experiments

- qualitative and quantitative methods

Usability is a special concern on the Web because unlike with other products / software, "users experience

Usability testing:

usability first and pay later". (Nielsen [B12])

[I-15]

[49]

[DP-1]PKDD 2001 Tutorial: “KDD for Personalization”

Data Preparation for Personalization

[DP-2]PKDD 2001 Tutorial: “KDD for Personalization”

Web Usage Mining

• Discovery of meaningful patterns from datagenerated by client-server transactions on one ormore Web servers

• Typical Sources of Data

– automatically generated data stored in server accesslogs, referrer logs, agent logs, and client-side cookies

– e-commerce and product-oriented user events (e.g.,shopping cart changes, ad or product click-throughs,etc.)

– user profiles and/or user ratings

– meta-data, page attributes, page content, site structure

What’s in a Typical Server Log?

<ip_addr><base_url> - <date><method><file><protocol><code><bytes><referrer><user_agent><ip_addr><base_url> - <date><method><file><protocol><code><bytes><referrer><user_agent>

203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:21 -0600] "GET /Calls/OWOM.htmlHTTP/1.0" 200 3942 "http://www.lycos.com/cgi-bin/pursuit?query=advertising+psychology&maxhits=20&cat=dir" "Mozilla/4.5 [en] (Win98;I)"

203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:23 -0600] "GET/Calls/Images/earthani.gif HTTP/1.0" 200 10689 "http://www.acr-news.org/Calls/OWOM.html""Mozilla/4.5 [en] (Win98; I)"

203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:24 -0600] "GET /Calls/Images/line.gifHTTP/1.0" 200 190 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)"

203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:25 -0600] "GET /Calls/Images/red.gifHTTP/1.0" 200 104 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)"

203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:31 -0600] "GET / HTTP/1.0" 200 4980"" "Mozilla/4.06 [en] (Win95; I)"

203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/line.gifHTTP/1.0" 200 190 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"

203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/red.gifHTTP/1.0" 200 104 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"

203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/earthani.gifHTTP/1.0" 200 10689 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"

203.252.234.33 www.acr-news.org - [01/Jun/1999:03:33:11 -0600] "GET /CP.html HTTP/1.0"200 3218 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"

[DP-4]PKDD 2001 Tutorial: “KDD for Personalization”

P reprocess ing P attern Ana lysisP atte rn D iscovery

C ontent andS tructure D ata

"Interesting"R u les, Patte rns,

and S ta tistics

R u les, Patte rns,and S ta tistics

PreprocessedC lickstream

D ata

R aw U sageD ata

The Web Usage Mining Process

[DP-5]PKDD 2001 Tutorial: “KDD for Personalization”

Raw UsageData

DataCleaning

EpisodeIdentification

User/SessionIdentification

Page ViewIdentification

PathCompletion Server Session File

Episode File

Site Structureand Content

Usage Statistics

Usage Data Preprocessing

[DP-6]PKDD 2001 Tutorial: “KDD for Personalization”

Data Preprocessing for Web Usage Mining

• Data cleaning

– remove irrelevant references and fields in server logs

– remove references due to spider navigation

– remove erroneous references

– add missing references due to caching (done aftersessionization)

• Data integration

– synchronize data from multiple server logs

– integrate e-commerce and application server data

– integrate meta-data (e.g., content labels)

– integrate demographic / registration data

[DP-7]PKDD 2001 Tutorial: “KDD for Personalization”

Data Preparation for Web Usage Mining(Cooley, Mobasher, Srivastava, 1999 [15])

• Data Transformation

– user identification

– sessionization / episode identification

– pageview identification

• a pageview is a set of page files and associated objectsthat contribute to a single display in a Web Browser

• Data Reduction

– sampling and dimensionality reduction (ignoringcertain pageviews / items)

• Identifying User Transactions (i.e., sets or sequencesof pageviews possibly with associated weights)

[DP-8]PKDD 2001 Tutorial: “KDD for Personalization”

User and Session Identification: Need forReliable Usage Data

• Validity of results in Web usage mining is affected bythe ability to:

– distinguish among different users to a site

– reconstruct the activities of the users within the site

• Difficult to obtaining reliable usage data

– proxy servers and anonymizers

– rotating IP addresses connections through ISPs

– missing references due to caching

– inability of servers to distinguish among different visits

[DP-9]PKDD 2001 Tutorial: “KDD for Personalization”

Identifying Users and Sessions

• Server log L is a list of log entries each containingtimestamp, host identifier, URL request (includingURL stem and query), referrer, agent, cookie, etc.

• User identification and sessionization

– user activity log is a sequence of log entries in Lbelonging to the same user

– user identification is the process of partitioning L intoa set of user activity logs

– the goal of sessionization is to further partition eachuser activity log into sequences of entriescorresponding to each user visit

[DP-10]PKDD 2001 Tutorial: “KDD for Personalization”

Sessionization Heuristics

• Real v. Constructed Sessions

– Conceptually, the log L is partitioned into an orderedcollection of “real” sessions R

– Each heuristic h partitions L into an ordered collectionof “constructed sessions” Ch

– The ideal heuristic h*: Ch* = R

• Two Basic Types of Sessionization Heuristics

– Time-oriented heuristics

– Navigation-oriented heuristics

[DP-11]PKDD 2001 Tutorial: “KDD for Personalization”

Time-Oriented Heuristics

• Consider boundaries on time spent on individualpages or in the entire a site during a single visit

– Boundaries can be based on a maximum sessionlength or maximum time allowable for each pageview

– Additional granularity can be obtained by treatingdifferent boundaries on different (types of) pageviews

h1: Given t0, and a threshold θθθθ, the timestamp for firstrequest in a constructed session S, the request withtimestamp t is assigned to S, iff t - t0 ≤≤≤≤ θθθθ.

h2: Given t1, and a threshold δδδδ, the timestamp for arequest in constructed session S, the next requestwith timestamp t2 is assigned to S, iff t2 - t1 ≤≤≤≤ δδδδ.

[DP-12]PKDD 2001 Tutorial: “KDD for Personalization”

Navigation-Oriented Heuristics

• Take the linkage between pages into account

– “linkage” can be based on site topology (e.g., split asession at a request that could not have been reachedfrom previous requests in the session)

– or can be usage-based (using referrers in log entries)

• usually more restrictive than topology-based heuristicsand more difficult to implement in frame-based sites

href: Given two consecutive requests p and q, with pbelonging to constructed session S. Then q is assignedto S, if the referrer for q was previously invoked in S, or ifthe referrer for q is “ undefined ” and tq - tp ≤≤≤≤ ∆∆∆∆ (time delay∆∆∆∆ is to allow for proper loading of frameset pages).

[DP-13]PKDD 2001 Tutorial: “KDD for Personalization”

Measures for Sessionization Accuracy(Berendt, Mobasher, Spiliopoulou, 2001 [7])

• A heuristic h maps entries in the log L intoelements of constructed sessions, such that:

– (a) each entry in L is mapped to exactly one elementof a constructed session

– (b) the mapping is order-preserving

• Measures quantify the successful mappings of realsessions to constructed sessions

– a measure M evaluates a heuristic h based on thedifferences between Ch and R

– each measure assigns to h a value M(h) ∈∈∈∈ [0,1] sothat M(h*) = 1

[DP-14]PKDD 2001 Tutorial: “KDD for Personalization”

Measures for Sessionization Accuracy

• Categorical and Gradual Measures

– categorical measures : based on the number of realsessions that are reconstructed by the heuristics

– gradual measures : based on the degree to which thereal sessions are reconstructed by the heuristics

[DP-15]PKDD 2001 Tutorial: “KDD for Personalization”

Categorical Measures

• Based on the notion of “complete reconstruction”

– a real session is completely reconstructed if all itselements are contained in the same constructedsession

– the measure Mcr(h) is the ratio of the number ofcompletely reconstructed real sessions in Ch to thetotal number of real sessions |R|

[DP-16]PKDD 2001 Tutorial: “KDD for Personalization”

Categorical Measures

• Derived categorical measures:

– Mcrs considers only completely reconstructed realsessions whose first element is also the first element ofa constructed session

– Mcre considers only completely reconstructed realsessions whose last element is also the last element ofa constructed session

– Mcrse considers only completely reconstructed realsessions with correct starts and ends

• in absence of overlapping real sessions for individualusers, this gives the number of constructed sessionsthat are identical to corresponding real sessions

[DP-17]PKDD 2001 Tutorial: “KDD for Personalization”

Gradual Measures

• Allow for measuring partial overlaps between realand constructed sessions

– degree of overlap between real sessions r andconstructed session c, dego(r,c), is the number ofelements they have in common divided by totalnumber of elements in r.

– degree of overlap for a real session r is the maximumdego(r,c) over all constructed sessions c.

– the measure Mo(h) is the average degree of overlapover all real sessions

– if a real session is completely reconstructed, itsoverlap degree is 1

[DP-18]PKDD 2001 Tutorial: “KDD for Personalization”

Gradual Measures

• To take the size of constructed session into account,we define the degree of similarity

– degs(r,c) = | r ∩∩∩∩ c | / | r ∪∪∪∪ c |

– Ms(h) is is the average degree of similarityt over all realsessions

– if a real session is completely reconstructed, itssimilarity degree is 1

[DP-19]PKDD 2001 Tutorial: “KDD for Personalization”

Which Measures?

• The choice of the measures depends on the goals ofusage analysis, for example:

– “complete reconstruction” may be appropriate forclustering and association-based analyses (it correctlyshows set of pages accessed together)

• it also preserves sequential order of accesses, so it canbe used for the analysis of users’ navigational behavior

– Mcrs : useful for analyzing access to entry points

– Mcre: useful for analyzing access to exit points

– overlap-based measures can be useful for comparingoverall effectiveness of sessionization heuristics ingrouping pages or objects

[DP-20]PKDD 2001 Tutorial: “KDD for Personalization”

Which Sessionization Heuristics?

• The choice of sessionization heuristic depends onthe characteristics of the data

– if individual users visit the site in short but temporallydense sessions, h2 may perform better than h1

– in cases when timestamps are not reliable (e.g., usingintegrated data across many log files), href may be abetter choice for sessionization

– referrer-based heuristics tend to perform worse inhighly dynamic, frame-based sites

[DP-21]PKDD 2001 Tutorial: “KDD for Personalization”

Comparison of SessionizationHeuristics

• cookies used to identifyunique users

• server generated sessionvariable used to identify“real” sessions

• site was frame-based andhighly dynamic

• thresholds of 30 and 10minutes were used for h1and h2, respectively

• href performed poorly, dueto propagated errors inmisclassified framesetreferences

• 30% of users had multipleIP addresses (coming frombehind proxy servers)

• cookies used to identifyunique users

• server generated sessionvariable used to identify“real” sessions

• site was frame-based andhighly dynamic

• thresholds of 30 and 10minutes were used for h1and h2, respectively

• href performed poorly, dueto propagated errors inmisclassified framesetreferences

• 30% of users had multipleIP addresses (coming frombehind proxy servers)

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

M_c

r

M_c

rs

M_c

re

M_c

rse

M_o

M_s

h1-30 h2-10 h-ref

[DP-22]PKDD 2001 Tutorial: “KDD for Personalization”

Mechanisms for User Identification

Method Descrip tion Priv acy Concerns

Adv antages Disadv antages

IP A ddre s s + Ag e nt

As s um e e a c h unique IP a ddre s s /A g e nt pa ir is a un ique us e r

Lo w A lw a ys a va ila b le . N o a dditio na l te c hno lo g y re quire d.

N o t g ua ra nte e d to be un ique . D e fe a te d by ro ta ting IP s .

E m be dde d S e ss io n Ids

Us e dyna m ic a lly g e ne ra te d pa g e s to a s s o c ia te ID w ith e ve ry hype rlink

Lo w to m e dium

Alw a ys a va ila b le . Inde pe nde nt o f IP a ddre s se s .

C a nno t c a pture re pe a t vis ito rs . Additio na l o ve rhe a d fo r dyna m ic pa g e s .

R e g is tra tio n Us e r e xplic itly lo g s in to the s ite .

M e dium C a n tra c k individua ls no t jus t bro w s e rs

M a ny us e rs w o n't re g is te r. N o t a va ila ble be fo re re g is tra tio n.

C o o k ie S a ve ID o n the c lie nt m a c hine .

M e dium to hig h

C a n tra c k re pe a t vis its fro m s a m e bro w s e r.

C a n be turne d o ff by us e rs .

S o ftw a re Ag e nts

P ro g ra m lo a de d into bro w s e r a nd s e nds ba c k usa g e da ta .

H ig h Ac c ura te us a g e da ta fo r a s ing le s ite .

L ik e ly to be re je c te d by us e rs .

[DP-23]PKDD 2001 Tutorial: “KDD for Personalization”

Impact of User Identification Heuristics

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

M_cr

M_crs

M_cre

M_crseM_o

M_s

h1-30-real h1-30-ipa

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

M_cr

M_crs

M_cre

M_crseM_o

M_s

h-ref-real h-ref-ipa

These experiments show the impact of using IP+Agent heuristic for useridentification on sessionization heuristics (as compared to cookies)

These experiments show the impact of using IP+Agent heuristic for useridentification on sessionization heuristics (as compared to cookies)

[DP-24]PKDD 2001 Tutorial: “KDD for Personalization”

navigationalpages

contentpages

Histogram ofpage referencelengths (secs)

Inferring User Transactions from Sessions

• Observation: reference lengthsfollow an exponentialdistribution

• Page types correlate withreference lengths

• Page types: navigational,content, or hybrid

• Can automatically classifypages as navigational or contentusing statistical modeling

• A transaction can be defined asan intra-session path ending in acontent page, or as a set ofcontent pages in a session

[DP-25]PKDD 2001 Tutorial: “KDD for Personalization”

Path Completion

• Refers to the problem of inferring missing userreferences due to caching.

• Effective path completion requires extensiveknowledge of the link structure within the site

• Referrer information in server logs can also be usedin disambiguating the inferred paths.

• Problem gets much more complicated in frame-based sites.

[DP-26]PKDD 2001 Tutorial: “KDD for Personalization”

User’s navigation path:A => B => D => E => D => B => C

URL Referrer A -- B A D B E D C B

A

B C

D E F

Path Completion - An Example

• There may be multiple candidates for completing the path.For example consider the two paths : E => D => B => C andE => D => B => A => C.

• In this case, the referrer field allows us to partiallydisambiguate. But, what about: E => D => B => A => B => C?

• One heuristic: always take the path that requires the fewest

[DP-27]PKDD 2001 Tutorial: “KDD for Personalization”

Integrating E-Commerce Events

• Either product oriented or visit oriented

• Not necessarily a one-to-one correspondence withuser actions

• Used to track and analyze conversion of browsers tobuyers

• Major difficulty for E-commerce events is definingand implementing the events for a site

– however, in contrast to clickstream data, gettingreliable preprocessed data is not a problem

• Another major challenge is the successfulintegration with clickstream data

[DP-28]PKDD 2001 Tutorial: “KDD for Personalization”

Product-Oriented Events

• Product View

– Occurs every time a product is displayed on apageview

– Typical Types: Image, Link, Text

• Product Click-through

– Occurs every time a user “clicks” on a product to getmore information

• Category click-through

• Product detail or extra detail (e.g. large image) click-through

• Advertisement click-through

[DP-29]PKDD 2001 Tutorial: “KDD for Personalization”

Product-Oriented Events

• Shopping Cart Changes

– Shopping Cart Add or Remove

– Shopping Cart Change - quantity or other feature (e.g.size) is changed

• Product Buy or Bid

– Separate buy event occurs for each product in theshopping cart

– Auction sites can track bid events in addition to theproduct purchases

[DP-30]PKDD 2001 Tutorial: “KDD for Personalization”

Content and Structure Preprocessing

• Processing content and structure of the site areoften essential for successful usage analysis

• Two primary tasks:

– determine what constitutes a unique page file (i.e.,pageview)

– represent content and structure of the pages in aquantifiable form

[DP-31]PKDD 2001 Tutorial: “KDD for Personalization”

Content and Structure Preprocessing

• Basic elements in content and structure processing

– creation of a site map

• captures linkage and frame structure of the site

• also needs to identify script templates for dynamicallygenerated pages

– extracting important content elements in pages

• meta-information, keywords, internal and external links,etc.

– identifying and classifying pages based on theircontent and structural characteristics

[DP-32]PKDD 2001 Tutorial: “KDD for Personalization”

Quantifying Content and Structure

• Static Pages

– All of information is contained within the HTML files fora site

– Each file can be parsed to get a list of links, frames,images, and text

– Files can be obtained through the file system, or HTTPrequests from an automated agent (site spider)

[DP-33]PKDD 2001 Tutorial: “KDD for Personalization”

Quantifying Content and Structure

• Dynamic Pages

– Pages do not exist until they are created due to aspecific request

– Relevant information can come from a variety ofsources: Templates, databases, scripts, HTML, etc.

– Three methods of obtaining content and structureinformation:

• Series of HTTP requests from a site mapping tool

• Compile information from internal sources

• Content server tools

Example of acontent-basedconcept hierarchy

Entertainment

Performing Music ...Arts

Artists Genres New Releases ...

Blues Jazz New Age ...

Concept hierarchies

PKDD 2001 Tutorial: "KDD for Personalization"

- purpose: group pages by their content

- method: analyze text, meta-tags, and/or URL (query string)

Integrating content and structure I

Domain knowledge: content

- grouping by classification or clustering

[DP-34]

music jazz artist ...

...

pv1pv2

1.001.00

0.80 0.050.00 0.70

3. features as weighted vectors of pageviews

jazz = [ <pv1,0.80>, <pv2,0.00>, ... ]

PKDD 2001 Tutorial: "KDD for Personalization"

Integrating content and structure II

1, vector space model: each unique word in corpus = one dimension,

2. feature - pageview matrix (note: "feature" = word, "pageview" because of frames)

each page(view) is a vector with a non-zero weight for each word in that page(view), zero weight for other words

4. group features -> feature clusters -> content profiles

Content profiles from feature clusters

[DP-35]

PKDD 2001 Tutorial: "KDD for Personalization"

Structure

- purpose: group pages by their hyperlink structure

A.html B.html C.html

Ad = 1 d = 2A

S = [ <A.html,0>, <B.html,1>, <C.html,3>, ... ]S = [ <A.html,0>, <B.html,1>, <C.html,0>, ... ] (only B content page)

(path distances)

- grouping by classification or clustering

- ex. path distance to a reference page

head, navigation, content, look-up, personal

Integrating content and structure III

- structure as weighted vector of page(view)s

- ex. page types in Pirolli et al. [B24] and Cooley et al. [B20]:

[DP-36]

[54] [15]:

Info on schools

indiv. school ...list of schools

1 parameter

...

2 par.s 3 parameters

Location Name ... Location+Name ...

PKDD 2001 Tutorial: "KDD for Personalization"

1. service-based concept hierarchy: which query options?

Relating content and structure to mined usage I :Content/structure mining as pre-/post-processing steps

Ex. online catalog search (Berendt & Spiliopoulou [B18, B17]):

[DP-37]

[8, 6]):

PKDD 2001 Tutorial: "KDD for Personalization"

2. discovering and comparing navigation patterns in classified pages

part of a resulting WUM navigation pattern:

Relating content and structure to mined usage I

[DP-38]

The head page is not the mostcommon entry point

A page designed to providecontent is being used as anavigation page

A usage cluster containspages from multiple content categories

A set of pages is frequentlyaccessed together, but notdirectly linked

usage statisticsgeneral

generalusage statistics

PKDD 2001 Tutorial: "KDD for Personalization"

Mined knowledge domain know-ledge source

=> discover patterns at differentlevels of abstraction, discoverdeviations from intended usage

site structure

site contentusage clusters

frequent itemsets site structure

site content

interesting belief example

Relating content and structure to mined usage IEx. WebSIFT Information Filter (from Cooley [B19]):

[DP-39]

[14]):

PKDD 2001 Tutorial: "KDD for Personalization"

Relating content and structure to mined usage II :Usage, content, and structure mining as 3 ways

of deriving a common kind of representation

usage: sessions / visits, or parts of them (past + current)

content: features

structure: pages and their characteristics

- a vector of tuples <pageview,weight>:

- unordered or ordered collections

=> identify clusters that are similar, where similarity is by usage, content, or structure

Mobasher, Dai, Luo, Sun, & Zhu [B22]

[DP-40]

[44]

Pattern Discovery for Personalization

PKDD 2001 Tutorial: �KDD for Personalization� Myra Spiliopoulou HHL . . .[PD-1]

We identify the following aspects of the personalization services, when

envisaged as the result of pattern discovery:

Visibility:

� personal recommendation

� silent dynamic adjustment

� static page/site adjustment

Service element:

� (link to) page

� application object

Matching based on:

� user pro�les

� user ratings

� user behaviour

� content of objects

Acquisition till action:

� all steps on-line

� o�-line pattern discovery

& on-line matching

PKDD 2001 Tutorial: �KDD for Personalization� Myra Spiliopoulou HHL . . .[PD-2]

Pattern Discovery Adaptive web sites

The approach of Perkowitz & Etzioni [52, 53]

The IndexFinder consists of three phases:

1. Log processing: Establishment of sessions as sets of page requests

2. Cluster mining: Grouping of co-occuring non-linked pages with help

of the site graph

3. Conceptual clustering:

� The representative concept of each cluster is identi�ed.

� Cluster members not adhering to this concept are removed from

the cluster.

� Pages adhering to this concept and not appearing in the cluster

are attached to the cluster.

PKDD 2001 Tutorial: �KDD for Personalization� [PD-3]

For each cluster, the IndexFinder presents to the Web designer:

� An index page with links to all pages of a cluster

The Web designer decides:

� whether the new page should indeed be established

� what its label should be

� where it should be located in the site

According to our categorization:

Visibility:

Static page/site adjustment

Service element: page containing

single application object

Matching based on:

user behaviour and page content

O�-line pattern discovery

PKDD 2001 Tutorial: �KDD for Personalization� [PD-4]

Pattern Discovery for Recommendations

The Collaborative Filtering Approach

Main idea: The objects suggested to a user are those preferred by users

similar to her.

1. The user's transaction is matched against logged transactions.

2. The matches are ranked.

3. The best (set of) match(es) are selected.

4. The objects that were shown in the selected transactions are

ranked � excluding objects already seen.

5. The objects with the highermost rank are shown to the user.

All steps on-line

PKDD 2001 Tutorial: �KDD for Personalization� [PD-5]

Pattern Discovery for Recommendations

The Data Mining Approach

Main idea: User similarity can be de�ned in terms of behaviour,

interests, preferences etc that can be modelled o�-line

1. Pattern discovery over the logged data

2. The contents of the user's transaction are matched against

the discovered patterns.

3. The matches are ranked.

4. The objects associated with the best matches are ranked �

excluding objects already seen.

5. The objects with the highermost rank are shown to the user.

so that) The voluminous logged data are only processed o�-line.

) On-line matching is performed against derived patterns.

PKDD 2001 Tutorial: �KDD for Personalization� [PD-6]

Pattern Discovery Recommendations on correlated items

The approach of Vucetic and Obradovic [50]

The recommendation problem is de�ned as:

Given the ratings of the active user on a set of items, which will

be her ratings on the remaining items?

Main idea:The ratings of an item can be predicted from the ratings

on correlated items.

Visibility:

Personal recommendation

Service element: application object

Matching based on: Rat-

ings of correlated items

O�-line discovery of predictors for the

impact of item correlation on ratings

PKDD 2001 Tutorial: �KDD for Personalization� [PD-7]

Methodology:

� The rating of each item given another item is approximated using a

linear function (named: expert).

� The average correlation among pairs of items is approximated using

random sampling over the user ratings.

� A weighting scheme is proposed to deal with the fact that users with

similar preferences may provide di�erent ratings for the same set of

items.

In this scheme:

� The linear experts for all pairs of items can be computed o�-line.

� The ratings for an active user are predicted from the set of pairs of

items rather than the set of user ratings.

PKDD 2001 Tutorial: �KDD for Personalization� [PD-8]

Pattern Discovery Repeat-buying theory for personalization

The approach of Geyer-Schulz et al [25]

Main idea:

) Recommendations are based on correlated products.

) Correlations can be identi�ed with Ehrenberg's repeat-buying theory,

) after adjusting it to the particularities of anonymous user sessions.

According to our categorization:

Visibility: Recommendation of in-

formation products

Service element: application

object or URL

Matching based on: user prefer-

ences for application objects

O�-line discovery of correlated

application objects

PKDD 2001 Tutorial: �KDD for Personalization� [PD-9]

Ehrenberg's repeat-buying theory

� predicts buyer behaviour from (a) penetration and (b) average

purchase frequency of an item

� by providing a reference model that characterizes repeated

co-occuring purchases of items as random or not random

where

penetration refers to the preference of a customer for a brand

average purchase frequency refers to repeated purchases of the

item, ignoring characteristics of the item, amount of the item and

size of the purchase as a whole.

PKDD 2001 Tutorial: �KDD for Personalization� [PD-10]

Assumptions of [25]:

� The probability of r co-occurences of two products in subsequent

purchases follows a logarithmic series distribution.

� Subsequent purchases of the same customer(s) can be observed as

equivalent to a set of purchase sessions during the log period.

Methodology:

� Computation of the frequency distributions of all co-occurences of

product pairs, counting one co-occurence per session only

� Elimination of distributions with a small number of observations

� Elimination of the � percentil of the high repeat-buy pairs

� Computation of the co-occurence predictor for each pair

so that outliers for each predictor can be observed as correlated items.

PKDD 2001 Tutorial: �KDD for Personalization� [PD-11]

[1]PKDD 2001 Tutorial: “KDD for Personalization”

Basic Idea: match left-hand side of rules with the active usersession and recommend items in the rule’s consequent

Essential to store patterns in efficient data structures

• the search of all rules in real-time is computationallyineffective

Ordering of accessed pages is not taken into account

Good recommendation accuracy, but the main problem is“coverage”

• high support thresholds lead to low coverage and mayeliminate important, but infrequent items from consideration

• low support thresholds result in very large model sizes andcomputationally expensive pattern discovery phase

Pattern Discovery Association mining for personalization

[PD-12]

[2]PKDD 2001 Tutorial: “KDD for Personalization”

Association Mining - Basic Concepts

We start with a set I of items and a set D of transactions.A transaction T is a set of items (a subset of I):

An Association Rule is an implication on itemsets X and Y,denoted by X ==> Y, where

The rule meets a minimum confidence of c, meaning thatc% of transactions in D which contain X also contain Y. Inaddition for each itemset a minimum support of s must besatisfied:

, ,X I Y I X Y⊆ ⊆ ∩ =∅

IT ⊆},...,,{ 21 miiiI =

/c X Y X≤ ∪/s X Y D≤ ∪

[PD-13]

Pattern Discovery Associated/Dissociated items and users

The approach of Lin, Alvarez & Ruiz [37]

Main idea:

) Users are associated to each other in terms of how they rate items.

) Items are associated to each other with respect to user preferences.

Associations among items can be found o�-line.

Associations to the active user can be found on-line.

According to our categorization:

Visibility:

Personal recommendation

Service element: application object

Matching based on: associations

among items and among users

On-line discovery of assoc.

rules with given RHS

PKDD 2001 Tutorial: �KDD for Personalization� [PD-14]

Methodology:

� Recommendations are subject to minimum con�dence and minimum

number of rules constraints.

� The miner discovers association rules iteratively, until the desired

number of rules is extracted.

The support cuto� is adjusted in each iteration.

� Rules concern both items and users:[User1:like] AND [User2:dislike]) [TargetUser:like]

[Item1:like] AND [Item2:like] ) [TargetItem:like]

� Candidate items are computed from associations involving users

similar to the active user. on-line

� Scores of items are computed from associations re�ecting user

preferences. o�-line

� The candidate items with highest scores are suggested to the active

user. on-line

PKDD 2001 Tutorial: �KDD for Personalization� [PD-15]

[3]PKDD 2001 Tutorial: “KDD for Personalization”

Main Idea: avoid offline generation of all association rules;generate recommendations directly from itemsets

• discovered frequent itemsets of are stored into an “itemsetgraph” (an extension of lexicographic tree structure ofAgrawal, et al 1999 [2])

• recommendation generation can be done in constant timeby doing a directed search to a limited depth

Pattern Discovery Association mining for personalization

The approach of Mobasher, et al, 2001 [45]

According to our categorization

Visibility: Personal recommenda-tions or silent dynamic adjustment

Service element: pageview

Matching based on: user behaviour

[PD-16]

[4]PKDD 2001 Tutorial: “KDD for Personalization”

Methodology:

• Construct Frequent Itemset Graph

– each node at depth d in the graph corresponds to anitemset

– I, of size d and is linked to itemsets of size d+1 thatcontain I at level d+1

– the single root node at level 0 corresponds to the emptyitemset

• frequent itemsets are matched against a user's activesession S by performing a search of graph to depth |S|

• a recommendation r is an item at level |S+1| whoserecommendation score is the confidence of rule S ==> r

[PD-17]

[5]PKDD 2001 Tutorial: “KDD for Personalization”

Pattern Discovery Sequence mining for personalization

Main Idea: take the ordering of accessed items into account

Two basic approaches

• use contiguous sequences (e.g., Web navigational patterns)

• use general sequential patterns

Contiguous sequential patterns are often modeled asMarkov chains and used for prefetching (i.e., predictingthe next user access based on previously accessed pages

In context of recommendations, they can achieve higheraccuracy than other methods, but may be difficult to obtainreasonable coverage

[PD-18]

[6]PKDD 2001 Tutorial: “KDD for Personalization”

Markov chain representation often leads to high spacecomplexity due to model sizes

Some Solutions

• selective Markov Models (Deshpande, Karypis, 2000 [17])

use various pruning strategies to reduce the number of states(e.g., support or confidence pruning, error pruning)

• longest repeating subsequences (Pitkow, Pirolli, 1999 [])

similar to support pruning, used to focus only on significantnavigational paths

• increased coverage can be achieved by using all-Kth-ordermodels (i.e., using all possible sizes for user histories)

Pattern Discovery Sequence mining for personalization

[PD-19]

Pattern Discovery Sequence mining for personalization

The approach of Gaul & Schmidt-Thieme [24]

Main idea:

) Recommendations are based on frequent patterns of past behaviour.

) A recommender is a predictor for a class of events.

) The constellation of the recommenders for all classes returns the

best recommendations for a given user history.

According to our categorization:

Visibility:

Recommendation

Service element: URLs, site objects

Matching based on: navigation

histories and URL proximity

O�-line training of classi�ers :=

local recommender systems

PKDD 2001 Tutorial: �KDD for Personalization� [PD-20]

A generic framework:

� with measures for the quality of a recommendation, taking the

distance between candidate URLs into account

� distinguishing between dynamic and static recommenders that

do/do not take user histories into account

� combining local recommender systems, each of which predicts a

class of events

where a class can be one user history, a group of histories or the whole

dataset.

Thereby, a navigation history is

� a set of events

� a sequence of events

� a more complex structure of co-occuring events

PKDD 2001 Tutorial: �KDD for Personalization� [PD-21]

Pattern Discovery Usage pro�les for personalization

The approach of Mobasher et al [43, 42]

Two types of usage pro�les:

Clusters of similar user transactions en-

hanced by a weighting scheme that removes

pages with support less than a mean value

Clusters of pages accessed

together

aggregating the members of each cluster into one representative pro�le

According to our categorization:

Visibility: Personal recommenda-

tion or silent dynamic adjustment

Service element: pageview

Matching based on: user behaviour

Also: page content in [44]

O�-line discovery of

aggregate pro�les

PKDD 2001 Tutorial: �KDD for Personalization� [PD-22]

Aims:) achieve similar performance to on-line collaborative �ltering

) using a minimal number of pageviews for the active user

Methodology:

� Preprocessing phase

� Assignment of weights to the pageviews

� Signi�cance testing, based on page stay time

� Normalization of pageview weights

� PACT: Pro�le Aggregation based on Clustering Techniques

1. Clustering of usage data to establish the aggregate pro�les

2. Materialization of the pro�les as vectors of (page,weight) pairs

3. Scan of the user's history by means of a sliding window that

allows only a set of page accesses to be considered in the pro�le

4. Matching the user session with each pro�le

5. Match ranking

PKDD 2001 Tutorial: �KDD for Personalization� [PD-23]

[7]PKDD 2001 Tutorial: “KDD for Personalization”

A Framework for Personalization Based onAggregate Profiles

Offline Phase

[PD-24]

[8]PKDD 2001 Tutorial: “KDD for Personalization”

Input from thebatch process

Usage Profiles

Content Profiles

• Match current user’s activity against the discovered profiles

• Each recommended item is assigned a score based on

– matching criteria and quality of aggregate profiles

– “information value” of the item based on domain knowledge

OnlinePhase

A Framework for Personalization Based onAggregate Profiles

[PD-25]

[9]PKDD 2001 Tutorial: “KDD for Personalization”

Aggregate Profiles Based on ClusteringTransactions (PACT) (Mobasher, et al, [42, 43])

• Input

– set of relevant pageviews in preprocessed log

– set of user transactions

– each transaction is a pageview vector

1 2{ , , , }

nP p p p= !

1 2{ , , , }mT t t t= !

1 2( , ), ( , ),..., ( , )nt w p t w p t w p t=

[PD-26]

[10]PKDD 2001 Tutorial: “KDD for Personalization”

Aggregate Profiles Based on ClusteringTransactions (PACT)

• Transaction Clusters

– each cluster contains a set of transaction vectors

– for each cluster compute centroid as clusterrepresentative

• Aggregate Usage Profiles

– a set of pageview-weight pairs: for transaction clusterC, select each pageview pi such that (in the clustercentroid) is greater than a pre-specified threshold

1 2, , ,c c cnc u u u=" !

ciu

[PD-27]

[11]PKDD 2001 Tutorial: “KDD for Personalization”

1.00 Call for Papers0.67 ACR News Special Topics0.67 CFP: Journal of Psychology and Marketing I0.67 CFP: Journal of Psychology and Marketing II0.67 CFP: Journal of Consumer Psychology II0.67 CFP: Journal of Consumer Psychology I

1.00 Call for Papers0.67 ACR News Special Topics0.67 CFP: Journal of Psychology and Marketing I0.67 CFP: Journal of Psychology and Marketing II0.67 CFP: Journal of Consumer Psychology II0.67 CFP: Journal of Consumer Psychology I

1.00 CFP: Winter 2000 SCP Conference1.00 Call for Papers0.36 CFP: ACR 1999 Asia-Pacific Conference0.30 ACR 1999 Annual Conference0.25 ACR News Updates0.24 Conference Update

1.00 CFP: Winter 2000 SCP Conference1.00 Call for Papers0.36 CFP: ACR 1999 Asia-Pacific Conference0.30 ACR 1999 Annual Conference0.25 ACR News Updates0.24 Conference Update

Example Aggregate Profiles

• Example Profiles based on the PACT method

– Based on data from the Association for ConsumerResearch Site:

[PD-28]

[12]PKDD 2001 Tutorial: “KDD for Personalization”

Hypergraph-Based Clustering(Han, Karypis, Kumar, Mobasher, 1998 [26])

• Construct a hypergraph fromsets of related items

– Each hyperedge represents afrequent itemset

– Weight of each hyperedge canbe based on the characteristicsof frequent itemsets orassociation rules (e.g.,support, confidence, interest,etc.)

[PD-29]

[13]PKDD 2001 Tutorial: “KDD for Personalization”

Hypergraph-Based Clustering

• Recursively partition hypergraph so that each partitioncontains only highly connected items

– Given a hypergraph we find a k-way partitioning suchthat the weight of the hyperedges that are cut isminimized

– The fitness of partitions measured in terms of the ratioof weights of cut edges to the weights of uncut edgeswithin the partitions

– The connectivity measures the percentage of edgeswithin the partition with which the vertex is associated --used for filtering partitions

– Vertices from partial edges can be added back toclusters based on a user-specified overlap factor

[PD-30]

[14]PKDD 2001 Tutorial: “KDD for Personalization”

Profiles Based on Hypergraph Clusters(Mobasher, Cooley, Srivastava, 1999 [41])

• Input

– input for clustering is the set of large itemsets fromassociation rule module

– each itemset is a hyperedge (weights are a function ofthe interest of the itemset)

– In practice can use the log of interest to avoid fewhighly frequent patterns from totally dominating

support( )( )

support( )i I

IInterest I

i∈

=∏

[PD-31]

[15]PKDD 2001 Tutorial: “KDD for Personalization”

{ | , }( , )

{ | }

e e C p econn p C

e e C

⊆ ∈=

Profiles Based on Hypergraph Clusters

• Aggregate Profiles (Item/Pageview Clusters)

– clustering program directly outputs a set ofoverlapping pageview clusters

– the weight associated with pageview p in a clusterC is based on the connectivity value of p inhypergraph partition:

[PD-32]

[16]PKDD 2001 Tutorial: “KDD for Personalization”

Recommendation Engine for UsingAggregate Profiles

• Match user’s activity against discovered profiles

– a sliding window over the active session to capture thecurrent user’s “short-term” history depth

– profiles and the active session are treated as vectors

– matching score is computed based on the similaritybetween vectors (e.g., normalized cosine similarity)

• Recommendation scores are based on

• matching score to aggregate profiles

• “information value” of the recommended item (e.g., linkdistance of the recommendation to the active session)

– recommendations are contributed by multiple profiles

[PD-33]

[17]PKDD 2001 Tutorial: “KDD for Personalization”

active user session Session window

Active Session Window

• Example: Session window of size 5

• Associating weight with items in the active session:

– assigned by site owner based on perceived importance

– based on recency (recent pages weighted higher) ortime spent on pages

– based on page types (e.g., content v. navigational)

A.html !!!! B.html !!!! C.html !!!! D.html !!!! E.html !!!! D.html !!!! F.html

[PD-34]

[18]PKDD 2001 Tutorial: “KDD for Personalization”

PROFILE 0-------------1.00 D.html0.50 A.html0.50 C.html0.50 E.html

PROFILE 1-------------1.00 A.html0.50 B.html0.50 C.html0.50 D.html0.50 E.html0.50 F.html

PROFILE 2-------------0.75 B.html0.75 F.html0.50 A.html0.50 C.html0.25 D.html

Current User Session U: A.html => B.html => C.html => E.html

Assume session window size of 3 and unit weights, using(cosine) similarity between active session and each profile:

Sim(U, P0) = (0.5+0.5) / SQRT (1.75 * 3) = 0.44Sim(U, P1) = (0.5+0.5+0.5) / SQRT(2.5*3) = 0.20Sim(U, P2) = (0.75+0.5) / SQRT(1.69*3) = 0.25

Candidate Recommendations:

P0: D.html (SQRT(0.44*1.00) = 0.66) A.html (SQRT(0.44*0.50) = 0.47)

P1: A.html (SQRT(0.20*1.00) = 0.45) D.html (SQRT(0.20*0.50) = 0.32) F.html (SQRT(0.20*0.50) = 0.32)

P2: F.html (SQRT(0.22*0.75) = 0.41) A.html (SQRT(0.22*0.50) = 0.33) D.html (SQRT(0.22*0.25) = 0.23)

Recommendations

Example profiles:

Example: Recommendations Based on PACT

[PD-35]

[19]PKDD 2001 Tutorial: “KDD for Personalization”

Weight Pageview ID Significant Features (stems)1.00 CFP: One World One Market world challeng busi co manag global0.63 CFP: Int'l Conf. on Marketing & Development challeng co contact develop intern0.35 CFP: Journal of Global Marketing busi global0.32 CFP: Journal of Consumer Psychology busi manag global

Weight Pageview ID Significant Features (stems)1.00 CFP: Journal of Psych. & Marketing psychologi consum special market1.00 CFP: Journal of Consumer Psychology I psychologi journal consum special market0.72 CFP: Journal of Global Marketing journal special market0.61 CFP: Journal of Consumer Psychology II psychologi journal consum special0.50 CFP: Society for Consumer Psychology psychologi consum special0.50 CFP: Conf. on Gender, Market., Consumer Behaviorjournal consum market

Integration of Content Profiles(Mobasher, et al., 2000 [44])

• Cluster features over the n-dimensional space of pageviews

• For each feature cluster derive a content profile bycollecting pageviews in which these features appear assignificant (represented as overlapping collections ofpageview-weight pairs)

[PD-36]

[20]PKDD 2001 Tutorial: “KDD for Personalization”

Integration of Content Profiles

• Integration with Recommendation Engine

– Usage and content profiles have similar representation,so they can be used by the recommendation engine inthe same way

• Item weights in profiles must be normalized, so contentand usage profiles can be compared on the same scale

– One approach: match active user session with allprofiles (both content and usage); then use the maximalrecommendation score for candidate recommendations

– Another approach: use content profiles for generatingrecommendations only if no matching usage profiles(with sufficient confidence) is found

[PD-37]

PKDD 2001 Tutorial: “KDD for Personalization”

Evaluating Personalization

[E-1]

A Web site’s usability is high if users

- experience high subjective satisfaction.

- achieve their goals / perform their tasks in little time,- do so with a low error rate,

PKDD 2001 Tutorial: "KDD for Personalization"

Evaluating usability: goals / tasks?

Recall operational definition:

Depending on the site, relevant goals / tasks may be to:

- ...

- stay in the site, return to the site, buy... => E-metrics- locate content (search),- learn,

[E-2]

PKDD 2001 Tutorial: "KDD for Personalization"

Evaluating usability: methodological caveats

- many uncontrolled variables (e.g., user intentions)

=> causal attribution of success to personalization becomes difficult

- poss. several differences between sites/site versions

Comparisons of sites with/without personalization,or before/after personalization introduced,with respect to "normal user behavior" (server logs):

usually a quasi-experiment

Questionnaire data:

observation of behavior in experiments advisableself-reports are often biased;

[E-3]

- 81% of 694 respondents have visited a person. site

- 64% of those found it useful: helpful, time saving

- perceived usefulness changes with product

- main problems: privacy, ineffectiveness when behav. (books > music > inf.technol. > news/articles > other)

did not reflect user "personally" (e.g., buying a gift) - concern that possible choices may be limited

- little differences of opinion between personalization occurring in response to behavior or to solicited input

PKDD 2001 Tutorial: "KDD for Personalization"

Evaluating usability: results I

CyberBehavior Research Center 1999 survey

[E-4]

increased by the user of its suggestions:

selection to a system they ed

increased of how it worked, and with

PKDD 2001 Tutorial: "KDD for Personalization"

Evaluating usability: results II

in IR systems carried out at Rutgers Univ. since 1995:

- measures of performance and subj. satisfaction

- relevance feedback worked well, but bettter with both

- relevance feedback + term suggestion performed better than, and was preferred to, pure relevance feedback- users preferred to save effort: were willing to hand over the subsidiary task of term

knowledgecontrol

trust

Belkin [3], reviewing studies of recommendations

[E-5]

PKDD 2001 Tutorial: "KDD for Personalization"

Evaluating usability: results III

Nielsen Net Ratings 1999

registered visitors of portal sites,

- spend > 3 times longer at home portal than others

- view 3-4 times more pages

i.e., those who can customize,

[E-6]

PKDD 2001 Tutorial: "KDD for Personalization"

Why are results scarce? Possible reasons

Mainspring and User Interface Engineering

"Web personalization is much over-rated and mainly used asa poor excuse for not designing a navigable website."

"Personalization costs. ... You’re more likely to get a good return on your efforts ... by fixing other problems, such as difficulty in locating content."

few web designers can afford to subject theirweb sites to formal usability testing in special labs."

"In essence, web design is a problem in user interface design.However, ...

Perkowitz & Etzioni [52]: Adaptive web sites: an AI challenge.

Nielsen [47]: Personalization is over-rated.

Lighthouse on the Web [36], quoting from

[E-7]

- adaptive link annotation

- encourages novices to navigate non-sequentially - enables users to rate the difficulty of a page better

- can reduce no. of visited pages + learning time

- adaptive presentation (more info depending on user knowledge) improves comprehension and reduces reading time

PKDD 2001 Tutorial: "KDD for Personalization"

Can other results be transferred?

- usually, user control helpful for learning; adaptive interfaces particularly helpful for novices

Research on adaptive educational software since ~ 1970

- interfaces changing over time: difficult to learn

[E-8]

- but unstable order of options is confusing for novices so hiding is better for novices

- for novices, direct guidance is useful ("next" link is most popular choice)

- the more users agree with the system’s suggestions, the better their test results

PKDD 2001 Tutorial: "KDD for Personalization"

Can other results be transferred? (contd.)

- adaptive link ordering improves user performance in information search tasks

(surveys in [11,12])

[E-9]

PKDD 2001 Tutorial: "KDD for Personalization"

Further factors affecting subjective satisfaction

- users don’t like to be recognized too soon

- users want to be anonymous, at least at certain times

- users want openness / disclosure

- must match user’s interests at the moment

- users don’t want extra work: "paradox of the active user"

- people don’t want relationships with corporations,

- be specific without being exclusive but with other people

(non-monetary rewards better than differential pricing)- consider information structure on Web

respect the user !

user control- (general guideline for software development)

[E-10]

Pattern Evaluation from the Business

Perspective

PKDD 2001 Tutorial: �KDD for Personalization� Myra Spiliopoulou HHL . . .[E-11]

User Satisfaction & Business Success

A company operating a Web site should care to create value for its

(prospective) customers:

) If there is no added value for the users, they will not buy and they

will not come again.

) If the users/customers are not satis�ed, they will not buy and/or

they will not come again.

) User/Customer satisfaction is a prerequisite for winning them to the

company.

�Winning� means:� Conversion: The user becomes customer.

� Retention: The customer stays loyal.

PKDD 2001 Tutorial: �KDD for Personalization� [E-12]

User Satisfaction Modelling

Indicators that require interaction with the user:

� Interactivity

� Ease of use

� Pleasing environment, entertaining environment

� Multiple navigation metaphors

� . . .

� Value creation, as perceived by the user

Indicators that can be measured/approximated without user interaction:

� Pages per visitor

� Duration of stay

� Visitors per page [60]

� Response time [60]

PKDD 2001 Tutorial: �KDD for Personalization� [E-13]

User Satisfaction Computation

� Identi�cation of a set of satisfaction indicators

� Design of an appropriate questionnaire

� Presentation of the questionnaire to a representative user sample

� Analysis of the responses

� Conclusions on the impact of the correlations among the satisfaction

indicators

PKDD 2001 Tutorial: �KDD for Personalization� [E-14]

User Satisfaction An experiment: The study of Eighmey [21]

� Factors re�ecting user satisfaction:

� Ease of use

� Information utility of the presented content

� Attractiveness of the presentation metaphor

� . . .

� Experimental settings for the evaluation of a set of commercial sites:

� Mapping of the factors on a questionnaire

� Establishment of a group of representative users

� Experimentation on a local computer pool in vitro

� Statistical analysis of the user responses

� Ranking of the factors by importance

PKDD 2001 Tutorial: �KDD for Personalization� [E-15]

The �ndings of [21] are:

� Quality of the presentation metaphor:

Entertainment when accessing the site plays the most important

role.

� Information utility:

The amount of information made available is the second most

important factor.

A further �nding is:

The web sites tested did not �make a strong and useful connection

with the interests of the study participants� and did not �succeed

in creating a context and sense of community needed to build a

continuing relationship with web site users�.

PKDD 2001 Tutorial: �KDD for Personalization� [E-16]

M. Spendolini recaptures �ve years of anti-customer-satisfaction reports

into the question:

Is �Customer Satisfaction� Irrelevant ?

Answer: Customer measurement systems should be revisited. [56]

PKDD 2001 Tutorial: �KDD for Personalization� [E-17]

User Satisfaction & Business Success

� User/Customer satisfaction is a prerequisite for a web-site's success.

� User/Customer satisfaction does not imply a web-site's success.

because:

� The goal of a web-site is not to make users happy.

� The goal of a web-site is to contribute into business success.

PKDD 2001 Tutorial: �KDD for Personalization� [E-18]

User Satisfaction & Business Success

� Awareness

� Contact

� Conversion

� Retention and� Abandonment instead of conversion

� Attrition instead of retention

How can these concepts be translated into indicators computable upon

customer data?

PKDD 2001 Tutorial: �KDD for Personalization� [E-19]

Business Success From the viewpoint of the Site

� �Site e�ciency� as� Number of page requests

� Duration of site visits[20]

� Site quality

� Response time

� Supported navigation modi

� Discoverability

� Accessibility

� Pages per visitor

� Visitors per page

[60]

PKDD 2001 Tutorial: �KDD for Personalization� [E-20]

Business Success From hits to loyal customers

The model of Berthon et al [9]

Active Investigators

CustomersLoyal

Customers

Site

UsersShort−time Visitors

PKDD 2001 Tutorial: �KDD for Personalization� [E-21]

The user-types in [9]:

Short-time visitors: Stay in the site for a very short time

Active investigators: Stay in the site longer and access many pages

Customers: Perform a purchase

Loyal customers: Customers that re-visit the site

to the e�ect that:

� The distinction between short-time visitors and active investigators

is based on

� duration of stay

� number of page requests

� The notion of �customer� is well-de�ned from the business

perspective.

� If re-visits can be traced, loyalty can be measured.

PKDD 2001 Tutorial: �KDD for Personalization� [E-22]

Modelling ambiguities:

� A user that performs only a few page requests can be

� a short time visitor

� an experienced customer

� A page request can be � a hit

� a page

� a framesetDo failed requests count?

� Is a customer that returns but makes no purchase still loyal?

Actionability of the model:

If contact e�ciency is 20% and conversion e�ciency is 2%, what should

be done?

PKDD 2001 Tutorial: �KDD for Personalization� [E-23]

Business Success From hits to loyal customers

Contact & Conversion e�ciency of pages [58]

Page: Invocation of static URL or script

Target page: Page, whose invocation corresponds to the full�llment

of the site's goal

� Ad click

� Product ordering

� Retrieval of a single document from an archive

� . . .

Action page: Page, whose invocation is a prerequisite for reaching a

target page

� Product inspection

� Query towards an archive or on-line catalog

� . . .PKDD 2001 Tutorial: �KDD for Personalization� [E-24]

Term re�nement by Spiliopoulou & Pohle [58]:

Active investigator: User accessing an action page

Customer: Active investigator accessing a target page

and

� Contact e�ciency of a page =jSessionsContainingThePagej

jAllSessionsj

� Conversion e�ciency of a page towards a target page over a group

of connecting paths =

jSessionsContainingAConnectingPathjjSessionsContainingAnActionPagej

PKDD 2001 Tutorial: �KDD for Personalization� [E-25]

The methodology of [58] to measure and improve contact and

conversion e�ciency of pages:

I. Speci�cation of the action and target pages as abstract concepts in

a service-based concept hierarchy

II. Discovery of frequent navigation patterns involving action pages

III. Discovery of frequent (and less frequent) patterns leading to target

pages

IV. Pattern visualization to identify the pages, at which the con�dence

drops (the users abandon the path)

PKDD 2001 Tutorial: �KDD for Personalization� [E-26]

Business Success From hits to customers

Micro-conversion rates by Lee et al [34]

Four steps until the purchase of a product:

1) Product impression: Seeing the hyperlink leading to a product

2) Click through: Following the link to the product

3) Basket placement: Selecting the product for purchase

4) Purchasing the product

and metrics for them:

1) product impression �

2) click through look-to-click rate

3) basket placement click-to-basket rate

4) purchase basket-to-buy rate

look-to-buy rate

PKDD 2001 Tutorial: �KDD for Personalization� [E-27]

The methodology of [34] to monitor site e�ectiveness:

I. Identi�cation of three aspects of a site for web merchandizing:

Merchandizing cues: Techniques for presenting and grouping

products to motivate purchases

Shopping metaphors: Means o�ered to the shoppers for �nding

products of interest

Web design features: Site layout

II. Problem decomposition:

1. Classifying hyperlinks by their merchandizing purposes

2. Measuring and analysing tra�c across those hyperlinks

3. Attributing the e�ectiveness of each hyperlink to merchandizing

cues, shopping metaphor or design features

using a visualization technique based on star�eld displays

PKDD 2001 Tutorial: �KDD for Personalization� [E-28]

Business Success Customer Loyalty

Loyalty is more than site re-visitation.

It relates to new purchases and their

� Recency

� Frequency

� Monetary value

Loyalty contributes to the customer's lifetime value .

PKDD 2001 Tutorial: �KDD for Personalization� [E-29]

Business Success Customer loyalty

Customer involvement by J. Lee et al [33]

Factors a�ecting customer loyalty:

� Trust

� Transaction costs

which in turn are a�ected by:

� Comprehensive information that su�ces for a purchase decision

� Shared value in the form of common beliefs among customers

� Communication among customers and store

� Uncertainty on the product quality

� Speci�city of the store

� Number of competitors

PKDD 2001 Tutorial: �KDD for Personalization� [E-30]

Hypotheses:

+ Comprehensive information, shared value and communication a�ect

trust positively.

+ Trust has a positive impact on customer loyalty.

� Transaction costs have negative impact on customer loyalty.

� Trust reduces transaction costs.

+ Uncertainty and number of competitors increase transaction costs.

� Speci�city a�ects transaction costs negatively.

and after the �rst set of experiments:

+ Speci�city has a positive impact on trust.

being tested with Questionnaires

PKDD 2001 Tutorial: �KDD for Personalization� [E-31]

The distinction between low and high involvement groups showed that:

HIGH Group:

� Speci�city has no impact on trust.

+ Speci�city a�ects transaction costs negatively.

LOW group:

+ Uncertainty has a positive impact on trust.

� The number of competitors decreases transaction costs.

+ Shared value has a positive impact on trust.

indicating that the factors a�ecting customer loyalty (site re-visits) di�er

among the two groups.

PKDD 2001 Tutorial: �KDD for Personalization� [E-32]

Business Success From hits to loyal customers

The e-metrics of NetGenesis [16]

Factors:

What should be disse-

minated by the mea-

sures?

Framework:

What is the basis

of the analysis?

Formulae:

What should be mea-

sured and how?

as result of an interview-based study with 20 successful e-companies [16]

PKDD 2001 Tutorial: �KDD for Personalization� [E-33]

Business Success From hits to loyal customers

e-Metrics Factors [16]

When measuring site (and business) success, marketeers consider:

� Awareness

� Acquisition vs Abandonment

� Conversion vs Attrition

� Retention vs Churn

PKDD 2001 Tutorial: �KDD for Personalization� [E-34]

Business Success From hits to loyal customers

e-Metrics Framework [16]

There is no agreed upon de�nition of most factors

� Is the base of the analysis a user, a session, a page request, a page

impression or a hit?

� What is a session?

� When does a user becomes a customer?

� When is a customer assumed to have attrited?

� How is loyalty de�ned?

Thesis: A company-internal de�nition is necessary.

PKDD 2001 Tutorial: �KDD for Personalization� [E-35]

Business Success From hits to loyal customers

e-Metrics Framework [16]

Example: The behaviour of a loyal customer in terms of

� visit duration

� number of visits during a period of time

� pages visited each time

is fundamentally di�erent for

� customers that make purchases in a retail store

� customers that plan a major purchase, e.g. of a con�gurable product

(contract, car)

� cooperation partners in a B2B setting

PKDD 2001 Tutorial: �KDD for Personalization� [E-36]

Business Success From hits to loyal customers

e-Metrics Formulae [16]

A large set of metrics is proposed, including

� stickiness

� slipperiness

� focus

of parts of a site.

The identi�cation and monitoring of

� optimal paths

is further suggested.

PKDD 2001 Tutorial: �KDD for Personalization� [E-37]

Business Success From hits to loyal customers

e-Metrics Formulae [16]

Implications:

For parts of a site:

� stickiness

� slipperiness

� focus

Monitoring of

� optimal paths

=)

) The notion of site-part must be

properly de�ned and disseminated to

the data analysis software.

) The monitoring of optimal paths

must be implemented somehow.

) The impact of the site structure

must be understood and made ex-

plicit.

PKDD 2001 Tutorial: �KDD for Personalization� [E-38]

Business success and the role of KDD

1. Where should success metrics be applied upon ?

� The whole population User populations are rarely uniform.

� Each user/customer Scalability might be an issue, speed also.

� Each group of users/customers

It is essential to distinguish among user/customer groups, e.g. in terms of

� experience

� interests

� demographics

� behaviour

and lifecycle value

PKDD 2001 Tutorial: �KDD for Personalization� [E-39]

Business success and the role of KDD

2. How should the metrics be computed?

� Mapping of statistical measures (accuracy, intercluster distance,

con�dence, support) on business measures

� Incorporation of computation prerequisites into the mining core ,!

� Impact of the site structure

PKDD 2001 Tutorial: �KDD for Personalization� [E-40]

,! Mine the gap !

� A user contacts a site in a sequence of sessions.

� The time inside a session plays a role.

� The elapsed time among sessions plays a role.

� The volatility of Web and population plays a role.

� eCRM observes both the individual sessions of a user and the whole

lifecycle of the user.

� Both must be supported in a seamless way.

� The associated information must be integrated and exploited.

� The web-site structure a�ects everything.

� Ordering and repetition are important.

� If optimal paths are speci�ed, suboptimal ones must be

quanti�ed and monitored.

� The behavioural patterns are a�ected by the site structure.

PKDD 2001 Tutorial: �KDD for Personalization� [E-41]

PKDD 2001 Tutorial: “KDD for Personalization”

Personalization and Privacy

[P-1]

- limits on the government’s power to interfere with personal decisions- physical privacy: limits on others’ abilitiy to learn things about a person by accessing their property- information privacy: the "right to control information about ourselves"

PKDD 2001 Tutorial: "KDD for Personalization"

Personalization and privacy

What is privacy?

"The right to be let alone." Warren & Brandeis [65]

includes

[P-2]

PKDD 2001 Tutorial: "KDD for Personalization"

Personalization and privacy

Why is privacy a central concern for personalization?

(1) Adapting to a person requires data on that person

"The Internet industry is built on trust betweenbusinesses and their customers - and privacy isthe number one ingredient in trust."

(3) The commercial side:

(2) The legal side: not all data may be collected/used

TrustE: How does Online Privacy Impact Your Bottom Line? [62]

[P-3]

PKDD 2001 Tutorial: "KDD for Personalization"

What are the dangers to privacy?

basic:

unethical practices:

technical:

security:

data are corrupted during entry, transfer, or storage

data are intercepted

data are used for novel purposes, sold to third parties, ...

data are correctly and legally transferred and stored,but embody "knowledge about a person"-> exacerbated by user ignorance(cf. widespread confusion or ignorance about what a cookie is: Ackerman, Cranor, & Reagle [1])

[P-4]

PKDD 2001 Tutorial: "KDD for Personalization"

What data are transmitted during Web usage?

transferred by the browser

IP addressdomain name (-> organization)

platform: browser type and versionreferrer address

query strings, form fill-ins

other technologies

cookiesglobally unique identifiersweb bugs

[P-5]

PKDD 2001 Tutorial: "KDD for Personalization"

User concerns about privacy

User concerns about privacy vary

- with respect to their severity: e.g., 27% marginally concerned, 56% pragmatic majority, 17% privacy fundamentalists

- the kind of data, e.g., credit card no. -> ... -> name -> ... -> email address -> ... -> favorite TV show:

(Spiekermann, Grossklags, & Berendt [57])

- depending on whether personal identity or profiling information is disclosed

(Ackerman et al. [1])

(Ackerman et al. [1])

[P-6]

US main stance

- avoid the generation of data ("data parsimony")- try to protect generated data

How to protect privacy I: general

PKDD 2001 Tutorial: "KDD for Personalization"

How to protect privacy II: agents and methods

German/European main stance

- state / law

- users / technology

self-governance

- parties to the transaction / market

[P-7]

PKDD 2001 Tutorial: "KDD for Personalization"

What to protect: Data in relation to persons(personally identifiable data)

"Jane Doe plays football."

"The person is a male American famoustennis player, and will soon marrya famous German tennis player."

person-related data

person-relatable data

Note: IP addresses at least person-relatable!

[P-8]

PKDD 2001 Tutorial: "KDD for Personalization"

(German laws, EU directive 95/46/EC)

German / EU legal basics I

informed consent:anything that is not explicitly allowed is forbidden(the greater the risk, the more detail must be explained)

whowhat forhow much

person-related data may only be collected with the

usage that deviates from any of these 3 is illegal

informed consent (opt-in!) about

rights against the state -> rights against other private parties

- : who collects the data- : for what purpose- : quality and amount necessary for purpose

[P-9]

analysis / research:

- aggregate into groups >= 10- if necessary, original data can be stored by a trustee

- person-related data must be anonymized s.t. it cannot be related back to the person

PKDD 2001 Tutorial: "KDD for Personalization"

German / EU legal basics II

[P-10]

PKDD 2001 Tutorial: "KDD for Personalization"

Implications for personalization

+ = legal, - = illegal, ? = controversial

- using results to send unsolicited snail/e-mail ? cookies: web site must also function without cookies;

? P3P: is the delegation of my privacy preferences to a computer program still an expression of my human will?

+ analyzing non-person-relatable web usage data+ using results to personalize a web page based on the

- using results to personalize based on past sessions

problematic if user unaware of cookie setting

current user’s current session

Privacy statements must be opt-in (cf. software licence agreements: "I agree")

German / EU legal basics III:

[P-11]

EU - US: Safe Harbor Principles (July 2000)- American enterprises that collect + process data from EU voluntarily subject themselves to principles that correspond to EU standard- FTC control

PKDD 2001 Tutorial: "KDD for Personalization"

Further rights under EU directive 95/46/EC

- individuals can inspect and correct their data, and they can disallow usage

data protection- independent institutions overlook data protection in each member country

- no data transfer to countries with inadequate

[P-12]

- government must not reveal medical histories etc.- government must not reveal certain information: Privacy Act, Driver’s Privacy Protection Act, ...- bars on third parties: video stores, lawyers, doctors, ...,

-> apply only to a narrow range of revelations "disclosure of private facts" tort

PKDD 2001 Tutorial: "KDD for Personalization"

- 4th Amendment: limits government’s power to search people, their homes, and their papers; trespass laws, ...

Third parties:

-> applies only to parties to a contract.

conflict information privacy - freedom of speech?

Information privacy gets protection from law of contract

US legal basics

Volokh [64]

[P-13]

PKDD 2001 Tutorial: "KDD for Personalization"

Self-governance: privacy seals

US privacy seals: TRUSTe, BBBOnline, CPA Web Trustwww.truste.org, www.bbb-online.org, www.cpawebtrust.org

evolving technologies and business models."

informed marketplace

not act alone; rather, it must work in concert with existing lawsand develop best practices. Self-governance relies on an

practices and the opportunity to exercise choice about how

existing laws and assuring that industry continue to work toward

development of self-governance to assure it remains true to itsunderlying principles and goals and meets the challenges of

"Unlike self-regulation, self-governance requires that

that demands disclosure of privacy

ubiquitous adoption of best practices. Media and advocacy groupsact as a collective conscience by scrutinizing the

industry

Governmentinformation is used. must fulfill its role by enforcing

TRUSTe Online Privacy Resource Book [63]

[P-14]

- third party audit - refer case to government authorities, usually FTC"Companies acting outside the bounds of the TRUSTe licenseagreement may be in breach of contract and be subject torevocation of the TRUSTe seal. This may be the most powerful [TRUSTe] tool , because of ... public relations consequences ..."

PKDD 2001 Tutorial: "KDD for Personalization"

Self-governance: How does TRUSTe work?

- contract signed between TRUSTe and the Web site- allows TRUSTe to address users’ privacy concerns regardless of their citizenship or the TRUSTe licensee- users can bring their complaints to TRUSTe Watchdog- Web site is required to respond quickly, TRUSTe can begin to mediate a resolution - change in company practice, or in posted policy

Note: non-EU standards compliant (opt-out possible)

[P-15]

web server

web server

PKDD 2001 Tutorial: "KDD for Personalization"

Technology: anonymizing web usage

user 1

user 2

...

GET x.htmlGET x.html proxy server

Pseudonomity and identity management

user

problem: proxy usually knows users’ identities

encryption encryption

encryption

Ex.: www.anonymizer.com

Mix networks; Crowds

Ex.: www.freedom.net, www.onion-router.net, anon.inf.tu-dresden.de,www.research.att.com/projects/crowds

[P-16]

- an initiative of the World Wide Web Consortium (W3C) in conjunction with many industry partners including Microsoft

- P3P allows the user agent to warn the user, or block communication altogether, if a selected Web site’s privacy policy does not comply with user preferences

- P3P enables Web sites to express their privacy practices in a standard format that can be retrieved automatically and interpreted easily by user agents

PKDD 2001 Tutorial: "KDD for Personalization"

P3P: The Platform for Privacy Preferences

[P-17]

who: <RECIPIENT>

what for: <PURPOSE>

how much: categories

unrelated, publicours, delivery, same, other-recipient,

current, admin, develop, customization,tailoring, pseudo-analysis,pseudo-decision,

contact, telemarketing, history, other-purposeindividual-analysis, individual-decision,

navigation,content, state, political, health, preference,computer, interactive, demographic,physical, online, uniqueid,purchase,financial,

PKDD 2001 Tutorial: "KDD for Personalization"

P3P’s XML elements include (can be extended):

location, government, other-category [67]

[P-18]

- 35-50% of questions were non-legitimate / irrelevant- still, 54% of participants answered at least 98% of the questions, although they had previously agreed to the sale und further usage of their data

In an experimental online store,agent Luciposed 56 questions in a sales dialogue.

PKDD 2001 Tutorial: "KDD for Personalization"

Problem "soft" interaction, communication flow

(Spiekermann et al. [57])

[P-19]

Q categories

peip

pepr

u

pd

top 10

product info

more product info

prod.inf./purch.opt.

purchase

PKDD 2001 Tutorial: "KDD for Personalization"

Communication flow and "obedient" answering

Examplequestions

Do you consider yourself photogenic?

How important are trend models to you?

When do you usually take photos?

What zoom do you want?

(Berendt [5])

[P-20]

PKDD 2001 Tutorial: “KDD for Personalization”

Conclusions

[C-1]

- what are the relevant criteria of evaluation? how can they be combined?

PKDD 2001 Tutorial: "KDD for Personalization"

Conclusions

powerful methods and software for personalization available,

but many questions remain, including:

... if there are not enough other users

... if that user is judged as an "uninteresting case"

- but: user reveals information, may not get a good return

- recommendations welcomed by users

- privacy concerns:

=> often, more data are collected than put to good use

[C-2]

more explicit user modeling

- integration with other data easier (XML etc.) - involve the user in diagnosis, provide for opt-out / opt-in

PKDD 2001 Tutorial: "KDD for Personalization"

(Some) future directions

changing roles of participants:

"opt-in with incentives": permission marketing

anonymity, pseudonymity, and personalization

- computers: knowledge organization and representation (-> personalization + information architecture design)- users interact more strongly with one another

- service providers offer "real" personal assistants (Web communities)

[C-3]

PKDD 2001 Tutorial: “KDD for Personalization”

References

[R-1]

References

[1] Ackerman, M.S., Cranor, L.F., and J. Reagle. Privacy in E-Commerce: Examining user scenarios andprivacy preferences. In Proceedings of the ACM Conference on Electronic Commerce. see alsohttp://www.research.att.com/library/trs/TRs/99/99.4/

[2] R. Agarwal, C. Aggarwal, and V. Prasad. A tree projection algorithm for generation of frequent itemsets. InProceedings of the High Performance Data Mining Workshop, Puerto Rico, 1999.

[3] Belkin, N.J. (2000). Helping people �nd what they don't know. Communications of the ACM, 43 (8), 58�61.

[4] Belkin, N.J., Cool, C., Head, J., Jeng, J., Kelly, D., Lin, S.J., Lobash, L., Park, S.Y., Savage-Knepshield,P., and Sikora, C. (2000). Relevance feedback versus local context analysis as term suggestion devices. InProceedings of the Eighth Text Retrieval Conference TREC8. Washington, D.C.

[5] Berendt, B. (2001). Understanding web usage at di�erent levels of abstraction: coarsening and visualisingsequences. In Working Notes of the Workshop �WEBKDD 2001 � Mining Log DAta Across All Customer

Touchpoints�, 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. SanFrancisco, CA, August.

[6] Berendt, B. (2000). Web usage mining, site semantics, and the support of navigation. In Working Notes ofthe Workshop �Web Mining for E-Commerce � Challenges and Opportunities.� 6th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining. (pp. 83�93). Boston, MA, August.

[7] B. Berent, B. Mobasher, M. Spiliopoulou, and J. Wiltshire. Measuring the accuracy of sessionizers for Webusage analysis. In Proceedings of the Web Mining Workshop at the First SIAM International Conference onData Mining, Chicago, 2001.

[8] Berendt, B. and Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multipleinformation systems. The VLDB Journal, 9, 56�75.

[9] P. Berthon, L. F. Pitt, and R. T. Watson. The world wide web as an advertising medium. Journal ofAdvertising Research, 36(1):43�54, 1996.

PKDD 2001 Tutorial: �KDD for Personalization� [R-2]

References

[10] Brusilovsky, P. (1996). Methods and techniques of adaptive hypermedia. User Modeling and User-Adapted

Interaction, 6, 87�129.

[11] Brusilovsky, P. (1997). E�cient techniques for adaptive hypermedia. In C. Nicholas and J. May�eld (Eds.),Intelligent hypertext: Advanced techniques for the World Wide Web, Berlin: Springer. 12�30.

[12] Brusilovsky, P. and Eklund, J. (1998). A study of user model based link annotation in educationalhypermedia. Journal of Universal Computer Science, 4, 429�448.

[13] Carroll, J.M. and Rosson, M.B. (1987). The paradox of the active user. In J.M. Carroll (Ed.), InterfacingThought: Cognitive Aspects of Human-Computer Interaction. Cambridge, MA: MIT Press.

[14] Cooley, R. (2000). Web Usage Mining: Discovery and Application of Interesting Patterns from Web Data.

University of Minnesota, Faculty of the Graduate School: Ph.D. dissertation.http://www.cs.umn.edu/research/websift/papers/rwc_thesis.ps

[15] Robert Cooley, Bamshad Mobasher, and Jaidep Srivastava. Data preparation for mining world wide webbrowsing patterns. Journal of Knowledge and Information Systems, 1(1), 1999.

[16] M. Cutler and J. Sterne. E-metrics � business metrics for the new economy. Technical report, NetGenesisCorp., http://www.netgen.com/emetrics, 2000. access date: July 22, 2001.

[17] M. Deshpande and G. Karypis. Selective Markov models for predicting Web-page accesses. TechnicalReport #00-056, University of Minessota, 2000.

[18] Dimitrova, V., Self, J., and Brna, P. (2000). Involving the learner in diagnosis � potentials and problems. InWeb Information Technologies: Research, Education and Commerce. Montpellier, France, May.

[19] Directive 95/46/EC of the European Parliament and the Council of 24 October 1995 on the protection ofindividuals with regard to the processing of personal data and on the free movement of such data.http://europa.eu.int/comm/internal_market/en/media/dataprot/law/index.htm

[20] X. Drèze and F. Zufryden. Testing web site design and promotional content. Journal of AdvertisingResearch, 37(2):77�91, 1997.

PKDD 2001 Tutorial: �KDD for Personalization� [R-3]

References

[21] J. Eighmey. Pro�ling user responses to commercial web sites. Journal of Advertising Research, 37(2):59�66,May-June 1997.

[22] X. Fu, J. Budzik, and K. J. Hammond. Mining navigation history for recommendation. In Proc. 2000International Conference on Intelligent User Interfaces, New Orleans, 2000.

[23] Gar�nkel, S. (2000). Database Nation. The Death of Privacy in the 21st Century. Sebastopol, CA: O'Reilly.

[24] W. Gaul and L. Schmidt-Thieme. Recommender systesms based on navigation path features. In [29], SanFransisco, CA, Aug. 2001. ACM.

[25] A. Geyer-Schulz, M. Hahsler, and M. Jahn. A customer purchase incidence model applied to recommendersystems. In [29], San Fransisco, CA, Aug. 2001. ACM.

[26] E-H. Han, G. Karypis, V. Kumar and B. Mobasher. Hypergraph Based Clustering in High-Dimensional DataSets: A Summary of Results. IEEE Bulletin of the Technical Committee on Data Engineering, (21) 1, 1998.

[27] T. Joachims, D. Freitag, and T. Mitchell. Webwatcher: A Tour Guide for the World Wide Web. InProceedings of the 15th International Conference on Arti�cial Intelligence, Nagoya, Japan, 1997.

[28] Kobsa, A., J. Koenemann and W. Pohl (2001). Personalized hypermedia presentation techniques forimproving online customer relationships. To appear in The Knowledge Engineering Review.http://www.ics.uci.edu/ kobsa/papers/2001-KER-kobsa.pdf

[29] R. Kohavi, B. Masand, M. Spiliopoulou, and J. Srivastava, editors. KDD'2001 Workshop WEBKDD'2001,San Fransisco, CA, Aug. 2000. ACM.

[30] R. Kohavi, M. Spiliopoulou, and J. Srivastava, editors. KDD'2000 Workshop WEBKDD'2000 on WebMining for E-Commerce � Challenges and Opportunities, Boston, MA, Aug. 2000. ACM.

[31] Kotwica, K. (1999). Survey: Website Personalization. Cyber Behavior Research Center.http://www.cio.com/forums/behavior/edit/survey7.html.

[32] R. Kuhlen. Informationsmarkt: Chancen und Risiken der Kommerzialisierung von Wissen. 2 edition, 1996.

PKDD 2001 Tutorial: �KDD for Personalization� [R-4]

References

[33] J. Lee, J. Kim, and J. Y. Moon. What makes internet users visit cyber stores again? key design factors forcustomer loyalty. In Proc. CHI'2000, pages 305�312, The Hague, NL, 2000. ACM.

[34] J. Lee, M. Podlaseck, E. Schonberg, R. Hoch, and S. Gomory. Analysis and visualization of metrics foronline merchandizing. In [39], pages 123�138. 2000.

[35] H. Lieberman. Letizia: An Agent that Assists Web Browsing. In Proceedings of the 1995 InternationalJoint Conference on Arti�cial Intelligence, Montreal, Canada, 1995.

[36] Lighthouse on the Web. (2000). Personalization goes one-on-one with reality.http://www.shorewalker.com/hype/hype60.html.

[37] C. R. W. Lin, S. A. Alvarez, and C. Ruiz. Collaborative recommendation via adaptive association rulemining. In [30], 2000.

[38] B. Liu, W. Hsu, and Y. Ma. Association rules with multiple minimum supports. In Proceedings of the ACMSIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-99, poster), San Diego,CA, 1999.

[39] B. Masand and M. Spiliopoulou, editors. Advances in Web Usage Mining and User Pro�ling: Proceedings ofthe WEBKDD'99 Workshop, LNAI 1836. Springer Verlag, July 2000.

[40] Mobasher, B., Cooley, R., and Srivastava, J. (2000). Automatic personalization based on web usage mining.Communications of the ACM, 43(8), 142�151.

[41] B. Mobasher, R. Cooley, and J. Srivastava. Creating adaptive web sites through usage-based clustering ofURLs. In IEEE Knowledge and Data Engineering Workshop (KDEX'99), 1999.

[42] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Improving the e�ectiveness of collaborative �ltering onanonymous web usage data. 2001.

[43] B. Mobasher, H. Dai, T. Luo, M. Nakagawa, Y. Sun, and J. Wiltshire. Discovery of aggregate usage pro�lesfor web personalization. In [30], 2000.

PKDD 2001 Tutorial: �KDD for Personalization� [R-5]

References

[44] B. Mobasher, H. Dai, T. Luo, Y. Su, and J. Zhu. Integrating web usage and content mining for moree�ective personalization. In E-Commerce and Web Technologies, volume 1875 of LNCS. Springer Verlag,Sept. 2000.

[45] B. Mobasher, H. Dai, T. Luo, M. Nakagawa. E�ective personalization based on association rule discoveryfrom Web usage data. Technical Report 01-010, Deaprtment of Computer Science, DePaul University.

[46] J. C. Nash. Know thy customer � from customer knowledge to customer insight. White paper, Accenture,accenture CRM Portal http://www.crmproject.com, access date: July 22, 2001.

[47] Nielsen, J. (1998). Personalization is Over-Rated. Alertbox for October 4, 1998.http://www.useit.com/alertbox/981004.html

[48] Nielsen, J. (2001). Usability Metrics. Alertbox, January 21, 2001.http://www.useit.com/alertbox/20010121.html

[49] Nielsen, J. (2000). Designing Web Usability: The Practice of Simplicity. New Riders Publishing.

[50] Z. Obradovic and S. Vucetic. A regression-based approach for scaling-up personalized recommender systemsin e-commerce. In [30], 2000.

[51] Parent, S., Mobasher, B., and Lytinen, S. (2001). An adaptive agent for web exploration based on concept

hierarchies. In Proceedings of the 9th International Conference on Human Computer Interaction. New

Orleans, LA, August.

[52] M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pages. In Proc. ofAAAI/IAAI'98, pages 727�732, 1998.

[53] M. Perkowitz and O. Etzioni. Adaptive web sites. Special Section of the Communications of ACM on�Personalization Technologies with Data Mining�, 43(8):152�158, Aug 2000.

[54] Pirolli, P., Pitkow, J., and Rao, R. Silk from a sow's ear: Extracting usable structures from the web. InCHI-96, Vancouver.

PKDD 2001 Tutorial: �KDD for Personalization� [R-6]

References

[55] Shneiderman, B. (1998). Designing the User Interface. Reading, MA: Addison-Wesley.

[56] M. Spendolini. Customer measurement systems � opportunities for improvement. White paper, MJSAssociates, accenture CRM Portal http://www.crmproject.com, access date: July 22, 2001.

[57] Spiekermann, S., Grossklags, J., and Berendt, B. (2001). Stated privacy preferences versus actual behaviourin EC environments: a reality check. In Proceedings der 5. Internationalen Tagung Wirtschaftsinformatik2001. Augsburg, Germany, September.

[58] M. Spiliopoulou and C. Pohle. Data mining for measuring and improving the success of web sites. InR. Kohavi and F. Provost, editors, Journal of Data Mining and Knowledge Discovery, Special Issue onE-commerce, volume 5, pages 85�114. Kluwer Academic Publishers, Jan.-Apr. 2001.

[59] Sterne, J. (1997). Do you know me?. WebMaster Magazine, April, 1997.http://www.cio.com/archive/webbusiness/040197_customer.html

[60] T. Sullivan. Reading reader reaction: A proposal for inferential analysis of web server log �les. In Proc. ofthe Web Conference'97, 1997.

[61] Thompson, M. (1999). Registered Visitors are a portal's best friend. The Industry Standard, June 7, 1999.http://www.thestandard.com.au/metrics/display/0,1283,901,00.html

[62] TrustE. (no date). How does Privacy Impact Your Bottom Line?http://www.truste.org/bus/pub_bottom.html

[63] TrustE. (2000). TrustE Online Privacy Resource Book. http://www.truste.org/about/oprah.doc

[64] Volokh, E. (2000). Personalization and privacy. Communications of the ACM, 43(8), 84�88.

[65] Warren, S. and Brandeis, L. The right of privacy. Harvard Law Review, 4, 193.

[66] Weber, G. (1996). Episodic learner modeling. Cogntive Science, 20, 195�236.

[67] W3C. The Platform for Privacy Preferences 1.0 (P3P1.0) Speci�cation.http://www.w3.org/TR/2000/CR-P3P-20001215 and http://www.w3.org/TR/P3P.

PKDD 2001 Tutorial: �KDD for Personalization� [R-7]