Data Mining for Business Applications Web Analysis for E...

35
<Nov. 2005> 1 © Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg Data Mining for Business Applications Web Analysis for E-Business Myra Spiliopoulou http://omen.cs.uni-magdeburg.de/itikmd © Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 2 Who am I? Myra Spiliopoulou Study of Mathematics - University of Athens PhD in Computer Science - University of Athens Habilitation in "Wirtschaftsinformatik" - HU Berlin Guest professor in the University of Magdeburg Professor of E-Business in the Leipzig Graduate School of Management Professor of Business Informatics in the University of Magdeburg Email: [email protected] Homepage: http://omen.cs.uni-magdeburg.de/itikmd Business Informatics

Transcript of Data Mining for Business Applications Web Analysis for E...

Page 1: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

1

<Nov. 2005> 1© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg

Data Mining for Business Applications

Web Analysis for E-Business

Myra Spiliopoulouhttp://omen.cs.uni-magdeburg.de/itikmd

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 2

Who am I?

Myra Spiliopoulou

Study of Mathematics - University of Athens

PhD in Computer Science - University of Athens

Habilitation in "Wirtschaftsinformatik" - HU Berlin

Guest professor in the University of Magdeburg

Professor of E-Business in the Leipzig Graduate School of

Management

Professor of Business Informatics in the University of Magdeburg

Email: [email protected]

Homepage: http://omen.cs.uni-magdeburg.de/itikmd

Business Informatics

Page 2: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

2

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 3

The research group KMD

KMD is the second Business Computer Science group in the Faculty of Computer Science, University of Magdeburg.

KMD is part of the Institute of Technical and Business Information Systems (ITI).

KMD has been established in February 2003.

KMD stands for:

KnowledgeManagement & Knowledge Discoveryin Business Information Systems

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 4

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

Page 3: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

3

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 5

Literature & Further Readings

Web Usability:Jakob Nielsen. Designing Web Usability. New Riders Publishing, 2000

Web Case Study:Hajo Hippner, Ulrich Küsters, Matthias Meyer, Klaus Wilde (Hrsg.) Handbuch Web Mining im Marketing, Vieweg, 2002

– Wann werden Surfer zu Kunden? Navigationsanalyse zur Ermittlung des Konversionspotenzials verschiedener Bereiche einer Site, Bettina Berendt, Myra Spiliopoulou

For a more detailed discussion on Web mining technologies see:Tutorial “Web Usage Mining for E-Business Applications” at ECML/PKDD 2002 by Myra Spiliopoulou, Bamshad Mobasher and Bettina Berendt, Helsinki, Italy, Aug. 19, 2002http://ecmlpkdd.cs.helsinki.fi/pdf/berendt-2.pdf

For a dedicated discussion on Web Evaluation see:Tutorial “Evaluation in Web Mining” at ECML/PKDD 2004 by Myra Spiliopoulou, Bettina Berendt, Ernestina Menasalvas,Pisa, Italy, Sept. 20, 2004http://www.wiwi.hu-berlin.de/~berendt/evaluation04

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 6

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

Page 4: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

4

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 7

Added Valueand Success

An institution operating a Web site should care to create value for its (prospective) users/customers:

First, the users need a motivation for visiting this site instead of any competitor site.

Second, what the site offers to them should be satisfactory for them.

Third, the site should motivate them to establish a long-term relationship with the institution.

The marketing terms for company sites are:– Conversion: The user becomes a customer.– Retention: The customer stays loyal.

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 8

The side-effects of poorly designed sites

According to an old survey, the top reasons for abandoning a Website were:

– Could not find the item: 56% (62%)

– Site disorganized or confusing: 54% (61%)

– Pages downloaded too slowly: 53% (60%)

It has been recognized that:

Web users are particularly impatient and insist on instant gratification.

Web users consider a poorly designed site as indicator of low credibility (Fogg et al, 2001).

[GVU, http://www.gvu.gatech.edu/user_surveys/survey-1998-10/]

Page 5: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

5

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 9

Defining Web Usability

[Lee, 1999]

"Web usability is the efficient, effectiveand satisfying completion of a specified

task by any given Web user. Support of essential user tasks made possible by Web technology serves as the

benchmark of usability."

"Web usability is the efficient, effectiveand satisfying completion of a specified

task by any given Web user. Support of essential user tasks made possible by Web technology serves as the

benchmark of usability."

(Derived from: ISO/DIS 9241-11, Ergonomic Requirements for office work with visual display terminals (VDTs). Part 11: Guidance on usability)

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 10

Variables Affecting Usability

System VariablesSystem Variables

• Internet transmission speed

• Visual display device capabilities

• Capabilities and limitations of user input devices

• Internet transmission speed

• Visual display device capabilities

• Capabilities and limitations of user input devices

User Characteristics

User Characteristics

• Extent of computeruse and Web experience and knowledge

• Age and disability-related limitations in memory and vision

• Reading ability

• Motivational factors

• Extent of computeruse and Web experience and knowledge

• Age and disability-related limitations in memory and vision

• Reading ability

• Motivational factors

• Finding desired information by direct search or discovering new information by browsing

• Comprehending the information presented

• Specialized tasks specific to certain Web sites, e.g. the ordering and downloading of products

• Finding desired information by direct search or discovering new information by browsing

• Comprehending the information presented

• Specialized tasks specific to certain Web sites, e.g. the ordering and downloading of products

WebUsability

WebUsability

[Derived from: Lee, 1999]

Page 6: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

6

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 11

Cognitive Framework:Reading a Web Page

The way users read a Web page has been studied extensively by means of eye tracking analyses;Faraday proposed a tool that traces the critique of a user towards a Web page by monitoring the user's face.

Two phases of reading a Web page have been distinguished:– Search phase: Viewer tries to find a salient entry point into the page

– Scanning phase: Extraction of information

[Faraday, 2000]

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 12

Reading a Web Page:The Search Phase

Motion: Most important variable is automatically detected, i.e. object in motion "pops out"

Size: Larger objects will be focused on in preference to smaller ones in order and duration

Images: will be attended to in preference to text

Color: Brighter elements dominate over darker

Text style: Typographical cues are nonverbal devices for attracting and focusing attention

Position: Reading text (non-textual pages) often begins in the top left area (in the center) of a page

[Faraday, 2000]

Page 7: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

7

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 13

Reading a Web Page:The Scanning Phase

Area– Elements are grouped according to Figure and Ground

relationships, forming a "Gestalt"

– Grouping or placing elements in proximity provides information about their relationships

Proximity and Reading Order– Grouping follows a reading order, moving from left to right

and top to bottom

– This fundamental axis dominates most design decisions

[Faraday, 2000]

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 14

Reading a Web Page:Predicting User Perception

[Faraday, 2000]

Page 8: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

8

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 15

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 16

Page Design:Rules of thumb

Web pages should be dominated by content of interest to the user

Rules of thumb:– Content should account for at least half a page’s design

– Navigation should be kept below 20 percent of the space for destination pages

Web pages should always work independently of the screen resolution

Page 9: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

9

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 17

Page Design:Response Times

Response times <= 0.1 seconds: User feels that the system is reacting simultaneously

Response times > 1.0 seconds: User feels disturbed in navigating through information space and experiences an interrupted flow of thoughts

Response times > 10.0 seconds: Limit for keeping user’s attention focused on the dialogue is exceeded; user should be warned by estimating time

Slow response times often translate directly into a reduced level of trust

Due to the Internet’s architecture, it is virtually impossible to exactly control download times

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 18

Page Design:Linking

Links can be categorized into three main categories

Structural navigation links, e.g. home page buttons or links to a set of pages that are subordinate to the current one

Associative links within the page content, usually underlined words, pointing to pages with further information about the anchor text

See Also lists of additional references, helping users to find what they want if the current page isn’t the right one

Page 10: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

10

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 19

Page Design:Linking

Links give the user’s eye something to rest on while scanning through an article, similar to call-outs in print media. As such, they should not exceed two to four words.

Link text should always provide information of what to expect behind the link.

If necessary, the text surrounding a link should provide sufficient additional information for the user to assess whether the page behind the link is worth loading.

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 20

Content Design:Writing for the Web

Users look first at the page’s main content area and scan it for headlines and other indicators of what the page is about

Reading from computer screens is about 25% slower than reading from paper

Write concisely and succinctly and write less than 50 percent of the text you would have used to cover the same material in print publication

Write for scannability

Use short paragraphs, subheadings and bulleted lists. Skimming instead of reading is a fact on the Web.

Page 11: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

11

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 21

Site Design:The Home Page

The first immediate goal of any home page is to answer the first-time users’ questions "Where am I?" and "What does this site do?"

Experienced visitors use the home page as entry point to the site’s navigation scheme

A home page should offer three features:– A directory of the site’s main content areas (navigation)

– A summary of most important news or promotions

– A search feature

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 22

Site Design:Navigation

Users should always be able to determine their location– Relative to the Web as a whole

– Relative to the site’s structure

Navigation interfaces should not differ drastically from the majority of Web sites

Include a logo or other site identifier consistently on every page

Location relative to the site’s structure is usually given by partly showing the site's structure and highlighting the area of the current page

Page 12: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

12

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 23

Site Navigation:Search Capabilities

About 50% of Web users are search dominant, approx. 20% are link dominant and the rest exhibits mixed behavior

Search should be easily available from every single page on the site

– Site structure and navigation still provide important clues about the search results location in the information space to the user

– Usability studies show that users are often unable to use boolean or advanced search features correctly

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 24

The Web (sub)site of theThomaskirche-Bach 2000 club

http://www.thomaskirche-bach.de/deutsch/vereinframe.htmAccess at 20.02.2002

State of: Summer Term 2002

Page 13: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

13

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 25

Additional Reading and References

Butler, Keith A., Jacob, Robert J.K. and John, Bonnie E. Human Computer Interaction: Introduction and Overview, Proc. of SIGCHI’98

Card, Sutart K., Pirolli, Peter, Van Der Wege, Mija, Morrison, Julie B., Reeder, Robert W., Schreáedley, Pamela K. and Bhoshart, Jenea (2001). Information Scent as a Driver of Web Behavior Graphs: Results of a Protocol Analysis Method for Web Usability, Proc. of SIGCHI‘01, Seattle, WA, USA, Mar. 31-Apr. 4

Faraday, Pete (2000). Visually Critiquing Web Pages, Proc. of the 6th Conference on Human Factors & the Web, Austin, Texas

Fogg, B.J,, Marshall, J., Laraki, O., Osipovich, A., Varma, C., Fang, N., Paul, J.,Rangnekar, A., Shon, J., Swani, P. and Treinen, M. (2001). What Makes Web Sites Credible? A Report on a Large Quantitative Study, Proc. of SIGCHI 2001, Seattle, WA, USA, Mar. 31-Apr. 4

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 26

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

Page 14: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

14

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 27

The notion of"good" Web site

The objective of a Web site is NOT

the maximisation of the number of visitors accessing it

the prolongation of the visitors' stay time

the inspection of a maximum number of pages/items/products

the satisfaction of the visitors

In general, the (abstract) objective of a Web site is

the contribution to the business objectives of its owner

with respect to the target groups accessing it

in a cost-effective way.

The "success" of a Web site is a measure of the degree, in which the site satisfies its objective.

... but this often a prerequisite for site success.

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 28

Business goals of a site (I)

1. Sale of products/services on-line

Amazon sells books (etc) online.The site should help the users find the most suitable books for their needs, identify more related products of interest and, finally purchase them in a secure and intuitive way.

Personalisation

Cross/Up-SellingSite design

2. Marketing for products/services to be acquired off-lineInsurances, banks, application service providers etc: providers of services based on a long-term relationship with the customer do not sell on-line to unknown users.The site should demonstrate to the users the quality of the product/service and the trustworthiness of its owner and initiate an off-line contact.

Page 15: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

15

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 29

Business goals of a site (II)

3. Reduction of internal costsSome banks offer online banking. Some insurances support case registration online. This reduces the need for human-preprocessingand the likelihood of typing errors.The site should help the users locate and fill the right forms and submit them in a secure and intuitive way.

4. Information disseminationGoogle, IMDB etc offer information by means of a search engine over a voluminous archive of high quality data.The site should help the users find what they search for, ensurethem upon the quality (precision and completeness) of the information provided, and also motivate them to access the products/services of the sponsors.

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 30

The Web Site of the Thomaskirche in Leipzig:Description and Objectives (1)

The Thomaskirche in Leipzig is the church in which Johann Sebastian Bach has worked during his most creative years as a composer.

The Web-Site of the church operates as

contact point for the religious community

source of information about the concerts in the church

source of information on the church building itself, which is subject to restoration

entry point for the club Thomaskirche-Bach 2000

entry point for the e-shop ThomasShop

The Web-Site has two main entry pages:– www.thomaskirche.org

– www.thomaskirche-bach2000.de

State of: Summer Term 2002

Page 16: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

16

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 31

The Web Site of the Thomaskirche in Leipzig:Description and Objectives (2)

The target groups of the Web site are:

the members of the religious community– located in Leipzig

– majority speaks German

the members of the club Thomaskirche-Bach 2000

visitors of the Thomaskirche, including tourists– located all over the world

– not necessarily speaking German

persons interested in Bach and his music, in the Thomaskirche and the works of art in it, in Leipzig, in the charitative activities of the church, in the concerts taking place in the church

– located all over the world

– not necessarily speaking German

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 32

The Web Site of the Thomaskirche in Leipzig:Description and Objectives (3)

The Web site has several objectives towards its target groups.

The case study focussed on only one of those objectives:

the contribution of the site to the income of the Thomaskirche

through– donations

– purchases in the ThomasShop

The club Thomaskirche-Bach 2000 plays a key role in the acquisition of donations:

Its objective is the maintenance of the Thomaskirche.

Its members are potential donors.

Most information/links concerning donations are in its subsite.

State of: Summer Term 2002

Page 17: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

17

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 33

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 34

A process overviewfor sales of products/services

The interaction of the potential customer with the company goes through three phases:

InformationAcquisition

InformationAcquisition

Negotiation&

Transaction

Negotiation&

Transaction

AfterSales

Support

AfterSales

Support

The ratio of persons going from one phase to the next is the basis for a set of positive and negative measures:

ContactConversion

Retention

AbandonmentAttrition Churn

Page 18: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

18

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 35

From the visitor to the loyal customer:The model of Berthon et al [BPW96]

Early realisation of the marketing measures for Web sites [BPW96]:

Conversion efficiency := Customers / Active investigators

Retention efficiency := Loyal Customers / Customers

whereby:Active investigators are visitors that stay long in the site.Customers are visitors that buy something.Loyal customers are customers that come to buy again.

Short-time visitorsSi

te u

sers

Active InvestigatorsCustomers

Loyal Customers

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 36

From the visitor to the loyal customer:The micro-conversion rates of Lee et al [LPSH01]

The model of Lee et al [LPSH01] distinguishes among four steps until the purchase of a product:

Product impression

Click through

Basket placement

Product purchase

and introduces micro-conversion rates for them:

look-to-click rate: click throughs / product impressions

click-to-basket rate: basket placements / click throughs

basket-to-buy rate: product purchases / basket placements

look-to-buy rate: product purchases / product impressions

A session is a set of click operations performed during one visit.Clicks leading to product impressionsand those corresponding to basket placements and purchases are uniquely identified as such.

Page 19: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

19

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 37

The e-metrics modelof Cutler & Sterne [CS00]

The e-metrics model of [CS00] encompasses:

Site-centric measures for regions of a site, including:

– Slipperiness := Stickiness

"Desirable value ranges" for each measure, depending on the purpose/objective of the region:

– A region used during information acquisition should be sticky.– The pages accessed during the negotiation and transaction phase

should be slippery.

Total time spent in the region

Number of visitors in the regionStickiness:=

Avg num of visited pages in the region

Number of pages in the regionFocus:=

How is a region defined ?

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 38

A Site is NOT for all users:Target Groups and User Segmentation

Truisms:

A site owner does not welcome all users equally.

A site cannot satisfy all users accessing it.

Hence, sites

are designed for some types of users

serve different user types to different degrees

User types are the result of:

User segmentation according to criteria of the site owner

User segmentation on the basis of personal characteristics

User segmentation with respect to recorded behaviour

Page 20: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

20

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 39

For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue

For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue

User SegmentationIn Predefined Business Segments

A company may partition its customers on the basis of

the revenue it obtains or expects from themthe (cost of) services it must offer them to obtain the revenue

There are different segmentation schemes, based on– the characteristics of the customers– the company portfolio

and producing a set of predefined classes.For a Web application this means:

2. Association rules forcross selling & up selling

3. Recommendations &Personalisation

1. Classification

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 40

For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment

For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment

User SegmentationIn Unknown Segments

Web site visitors can be grouped on the basis of their interests, characteristics and navigational behaviour without assuming predefined groups.

There is much research on user groupingbased on

– the properties and contents of the objects being visited– the declared or otherwise known characteristics of the visitor– (the order of the requests)

For a Web application this means:

2. Recommendations &Personalisation

1. Clustering

Page 21: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

21

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 41

User Segmentationon navigational behaviour

Web site visitors exhibit different types of navigational behaviour.

Model I (simplistic):Some users navigate across links. Others prefer a search engine.

Model II [FGL+00]:

based on criteria like active time spent on-line and per page, pages and domains accessed etc.

Model III [Moe] for merchandising sites:

based on criteria like purchase intention, time spent on the site,number of searches initiated, types of pages visited etc.

SimplifiersSimplifiers SurfersSurfers BargainersBargainersConnectorsConnectors RoutinersRoutiners SportstersSportsters

DirectbuyingDirectbuying

Knowledgebuilding

Knowledgebuilding

Search/Deliberation

Search/Deliberation

HedonicbrowsingHedonicbrowsing

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 42

Data Mining for Web Site Evaluation

Web Mining encompasses models, algorithms and evaluation criteria for the analysis of web sites:

Evaluation criteria with respect to business success

Models:– A site may be observed as a collection of pages, a graph of pages, an agent

interacting with the user, …

– A page may be observed as a document, a hypertext document, a collection of links, a node in a graph, …

– The interaction with the web site may be observed as a set of clicks, a set of queries, a sequence of clicks/queries, a sequence of application-specific actions, …

– External information about users, contents, products, actions may be incorporated from the data warehouse or the corporate ontology

Data mining algorithms for: classification, clustering, assoc. rules, …

Data preparation algorithms

Page 22: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

22

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 43

Literature and Further Readings (1)

[BPW96] P. Berthon, L.F. Pitt and R.T. Watson. The World Wide Web as an advertising medium. Journal of Advertising Research, 36(1), pp. 43-54, 1996.

[Ber02] Berendt, B. (2002). Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery, 6, 37-59.

[BS00] Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75.

[CPCP01] Chi, E.H., Pirolli, P., Pitkow, J.E. (2000). The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site. In Proceedings CHI 2000 (pp. 161-168).

[CS00] M. Cutler and J. Sterne. E-metrics — Business metrics for the new economy. Technical report, NetGenesis Corp., http://www.netgen.com/emetrics (access date: July 22, 2001)

[DFAB98] Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale. Human Computer Interaction. Prentice Hall Europe 1998. Cited after http://www.tau-web.de/hci/space/i7.html and http://www.tau-web.de/hci/space/x12.html.

[DZ97] X. Dreze and F. Zufryden. Testing web site design and promotional content. Journal of Advertising Research,37(2), pp. 77-91, 1997.

© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 44

Literature and Further Readings (2)

[FGL+00] J. Forsyth and T. McGuire and J. Lavoie. All visitors are not created equal. McKinsey marketing practice. McKinsey & Company. Whitepaper. 2000.

[Flem98] Fleming, J. (1998). Web Navigation. Designing the User Experience. Sebastopol, CA: O'Reilly.

[HSB02] K.-P. Huber, F. Säuberlich, C. Böhm. Kennzahlenbasiertes Web Controlling miteiner Web Scorecard. In "Handbuch Web Mining im Marketing" (eds. H. Hippner, M. Merzenich, K. Wilde). vieweg. 2002 (on German)

[KNY00] Kato, H., Nakayama, T., & Yamane, Y. (2000). Navigation analysis tool based on the correlation between contents distribution and access patterns. In Working Notes of the Workshop "Web Mining for E-Commerce - Challenges and Opportunities." at SIGKDD-2000. Boston, MA (pp. 95-104).

[KP92] R.S. Kaplan, D.P. Norton. The Balanced Scorecard: Translating Strategy to Action. Boston MA. 1992

[KP03] Kohavi, R. and Parekh, R. Ten Supplementary Analyses to Improve E-Commerce Web Sites. In Proceedings of the WebKDD 2003 Workshop - Webmining asa Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA (pp. 29-36).

© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004

Page 23: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

23

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 45

Literature and Further Readings (3)

[KB04] A. Kralisch und B. Berendt. Cultural determinants of search behaviour onwebsites. In V. Evers, E. del Galdo, D. Cyr & C. Bonanni (Eds.), Designing for GlobalMarkets 6. Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation. Vancouver, Canada, 8 - 10 July, 2004. Vancouver, BC: Product & SystemsInternationalisation, Inc., pp. 61-74, 2004.

[Kuhl96] R. Kuhlen. Informationsmarkt: Chancen und Risiken der Kommerzialisierungvon Wissen. 2nd edition, 1996 (on German)

[LPS+00] Junghoung Lee, M. Podlaseck, E. Schonberg, R. Hoch and S. Gomory. Analysis and visualization of metrics for online merchandizing. In "Advances in Web Usage Mining and User Profiling: Proc. of the WEBKDD'99 Workshop", LNAI 1836, Springer Verlag, pp. 123-138, 2000.

[Moe] W. Moe. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. In Journal of Consumer Psychology.

[Niel00] Nielsen, J. (2000). Designing Web Usability: The Practice of Simplicity. New Riders Publishing.

© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 46

Literature and Further Readings (4)

[SF99] M. Spiliopoulou, L.C. Faulstich. WUM: A Tool for Web Utilization Analysis. In: Extended version of Proc. EDBT Workshop WebDB’98, LNCS 1590. Springer Verlag, Berlin Heidelberg New York, pp 184–203, 1999.

[Shne98] Shneiderman, B. (1998). Designing User Interface. Strategies for Effective Human-Computer Interaction. 3rd edition. Reading, MA: Addison-Wesley.

[Ste03] Sterne, J. WebKDD in the Business World. Invited talk in the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA.

[Spi99] M. Spiliopoulou. The laborious way from data mining to Web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on ”Semantics of the Web”, 14, pp. 113–126, 1999.

[SP01] M. Spiliopoulou,C.Pohle. Data mining for measuring and improving the success of Web sites. In Journal of Data Mining and Knowledge Discovery, Special Issue on E-commerce, 5, pp. 85–114. Kluwer Academic Publishers. 2001

[Sul97] T. Sullivan. Reading reader reaction: A proposal for inferential analysis of web server log files. Proc. of the Web Conference'97, 1997.

© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004

Page 24: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

24

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 47

Thank you very much !There is more ...

Questions thus far ?

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 48

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

Page 25: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

25

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 49

The Web-Server Log Dataof the Thomaskirche

Data of the analysis:

Web-server log of www.thomaskirche-bach2000.de – Extended-Log-Format

– No cookies

for a period of several months

Data preparation:

1. Data cleaning

2a. Mapping of activities to users

2b. Session reconstruction

3. Mapping page invocations into application-specific concepts

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 50

The Web-Server Log Dataof the Thomaskirche

1. Data cleaning:Elimination of all invocations of scripts, pictures, navigation bars and similar artefacts appearing on each page

Approximately one log record per pageElimination of irrelevant records (robots, administrator)

2. Sessionisation:One IP address assumed as one user

– Low number of accesses per day– Low number of simultaneous accesses

All accesses from the same IP address during the same period assigned to the same userNo recognition of recurring users

One session := One visit

3. Mapping of page invocations to application-specific concepts

State of: Summer Term 2002

Page 26: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

26

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 51

The concept hierarchy CH-1for the conversion of customers and donors

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 52

The concept hierarchy CH-2for the preferences of the visitors

State of: Summer Term 2002

Page 27: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

27

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 53

Analysis:Objectives and tools

The goals of the analysis were:

Gaining insights on the conversion efficiency of the site

Identification of pages that need improvement

but NOT the discovery of a posteriori interesting patterns.

The goals were pursued through the formulation of concrete mining queries.

Tools:

SAS Enterprise Miner for the concepts in CH-1

WUM for the concepts in CH-1 und CH-2

whereby both tools lead to the same findings for CH-1.

State of: Summer Term 2002

Sequence miningSequence mining

Association rulesdiscovery

Association rulesdiscovery

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 54

Association rules discovery (1)

Input: A set T of transactions involving items from an itemset I

Output: Frequent groups of items and rules derived from these groups

Approach: For a given frequency threshold σ:

1. Find all items that appear in k transactions, with k/|T| >= σ .They constitute L1, the set of all frequent 1-item itemsets.

2. For i=2,...– Expand each itemset x in Li-1 by one item from I that is not already in x.

They constitute Ci.

– Remove from Ci all itemsets that do not appear in at least σ percent of the transactions.The remainder is Li, the set of all frequent i-item itemsets.

until Ci is empty.

3. For each frequent itemset x, generate rules of the form A -> B where: A and B partition x and their intersection is empty.

Simplification of the Apriori algorithm:Agrawal Rakesh, Imielinski T. and Swami Arun "Mining Association Rules Between Sets ofItems in Large Databases", Proc. of SIGMOD´93, 207-216, Washington DC, May 1993.

Page 28: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

28

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 55

Association rules discovery (2):Example of a fictive, tiny dataset

A fictive dataset as a non-1NF relation:

CustomerIdCustomerId ItemsItems

1 chocolate, orange juice, toothpaste, rice

2 bananas, chocolate, toothpaste

3 milk, rice

4 bananas, chocolate

5 bananas, chocolate, orange juice

6 toothpaste, rice

7 chocolate, milk, rice

8 bananas, chocolate, rice

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 56

Association rules discovery (3):Items, itemsets and their frequency

in the Example

I = {bananas, chocolate, milk, orange juice, toothpaste, rice}

Some itemsets:

{bananas, chocolate}

{chocolate, milk, rice}

{orange juice}

{bananas}

{orange juice, milk, rice}

I

Their frequencies:

4/8

1/8

2/8

4/8

0/80/8

For σ = 0.5 :

FREQUENT

NOT FREQUENT

NOT FREQUENT

FREQUENT

NOT FREQUENTNOT FREQUENT

CustomerIdCustomerId ItemsItems

1 chocolate, orange juice, toothpaste, rice

2 bananas, chocolate, toothpaste

3 milk, rice

4 bananas, chocolate

5 bananas, chocolate, orange juice

6 toothpaste, rice

7 chocolate, milk, rice

8 bananas, chocolate, rice

Page 29: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

29

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 57

Association rules discovery (4):Support, confidence and other statistical measures

Let A -> B be an association rule derived from an itemset x = A U B.

To what extend does the dataset support the rule ?

support(A->B) := support(x) := frequency of x in T

Given A, how confident are we that B will also appear ?

confidence(A->B) := support(A->B) / support(A)

Obviously it holds that:confidence(B->A) = support(A->B) / support(B)

Is the likelihood that B appears higher when A is given than in the whole population?

lift(A->B) := confidence(A->B) / support(B)

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 58

Sequence mining

Input: A set T of transactions, where a transaction is a sequence of items/events from an itemset I

Output: Subsequences of events that appear frequently in this dataset and order-preserving rules derived from these subsequences

Approach: Extensions of algorithms for association rules discovery by

1. order-respecting variations,

2. distinction between adjacent and non-adjacent events

or

Specialised algorithms, e.g. for– web-server log analysis– genom sequence analysis– discovery of richer structures like trees and graphs

Page 30: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

30

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 59

Agenda

Designing a Good Web Site: Web Usability

Web Site Evaluation: Modeling & Measuring Success

Designing a Good Web Site: Business Success

Designing a Good Web Site: Heuristics

Web Site Evaluation: Preparing & Mining

Web Site Evaluation: Understanding & Acting

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 60

Conversion rateand ideal navigation paths

Question: Contribution of pages to the conversion efficiency

Approach:

The pages were grouped according to their conversion potential into the concepts of the hierarchy CH-1.

An ideal path (cf. [CS00]) was specified, leading from low potential towards higher potential pages, taking the connectivity of the pages into account.

The ideal path was juxtaposed to frequent paths involving the concepts of CH-1.

LowPotentialCustomerLowPotentialCustomer

MediumPotentialCustomerMediumPotentialCustomer

HighPotential-IHighPotential-I

HighPotential-IIHighPotential-II

CustomerCustomer

State of: Summer Term 2002

Page 31: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

31

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 61

Ideal path vs frequent pathsfor customer conversion

Sequence mining revealed that:

The ideal path was not frequent, i.e. there were hardly sessionscontaining it as a whole.

Break point: MediumPotentialCustomer was invoked more than once in the observed sessions.

Interpretation: The visitors of the ThomasShop inspect more than one products before deciding to put one into the basket.

Approach: The frequencies of the subpaths of the ideal path were studied instead.

– [ LowPotentialCustomer MediumPotentialCustomer ]

– [ MediumPotentialCustomer HighPotential-I ]

– [ HighPotential-I HighPotential-II ]

>= 55%

LOW

LOW

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 62

Ideal path and frequent pathsfor donor conversion

The ideal path for donations was similar:

and was not frequent either.

Break point: MediumPotentialDonor was invoked more than once in the observed sessions.

Approach: The frequencies of the subpaths of the ideal path were studied instead.

– [ LowPotentialDonor MediumPotentialDonor ]– [ MediumPotentialDonor HighPotentialDonor ]

The cross conversion of potential donors to potential customers was also low:

– [ MediumPotentialDonor LowPotentialCustomer ]

indicating limited synergy effects among the regions of the site.

>= 55%<= 25%

LowPotentialDonorLowPotentialDonor MediumPotentialDonorMediumPotentialDonor HighPotentialDonorHighPotentialDonor

<= 20%

State of: Summer Term 2002

Page 32: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

32

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 63

Preferences of the visitorsin the MediumPotential-area:

Donations

The low conversion rates from the MediumPotential-pages to the HighPotential-pages lead to a closer inspection of the former.

MediumPotentialDonorMediumPotentialDonorBachClubPotentialBachClubPotential

BachOrganBachOrgan

NewsNews

EventsEvents

DonationsDonations

MembershipMembership

DescriptionDescription

ContactContact

BachClubBachClub

InfoInfo

Fill-In FormFill-In Form

ConcertCardsConcertCards

~ 30%

20%...30%

~ 15%

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 64

Preferences of the visitorsin the MediumPotential-area:

ThomasShop (1)

The low conversion rates from the MediumPotential-pages to the HighPotential-pages lead to a closer inspection of the former.

The impact of each product (type) on the conversion efficiency cannot be assessed in a straightforward way.

MediumPotentialCustomer

MediumPotentialCustomer

AudioCDsAudioCDs

SouvenirsSouvenirs

BooksBooks

ClothesClothes

BeveragesBeverages

LeatherLeather

~ 50%

~ 30%

State of: Summer Term 2002

Page 33: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

33

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 65

Preferences of the visitorsin the MediumPotential-area:

ThomasShop (2)

The impact of each product (type) on the conversion efficiency cannot be assessed in a straightforward way:

A product may stimulate purchases without being purchased itself.

Most subsessions inside the ThomasShop contained more than one product inspection.

We can just say that:

AudioCDs were most often the first choice of inspection in the ThomasShop.

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 66

The findings,closely observed (1)

The conversion efficiency of the site varies among the page groups:

1. The LowPotential-pages show a conversion rate of more than 50% towards MediumPotential-pages.

2. The MediumPotential-pages, which lead to the forms for purchase and donations and should motivate the accomplishment of these activities have low conversion rate.

3. The HighPotential-pages have a relatively higher conversion rate but should be improved nonetheless: They allow the attrition of customers/donors just before the completion of the transaction.

State of: Summer Term 2002

Page 34: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

34

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 67

The findings,closely observed (2)

Example: The sequence template [ BachClubPotential Y ]

The conversion rates vary on Y between 5% and 30%.

Y stands for pages that can be reached from the same vertical navigation bar.

whereby:

The links in the navigation bar build two groups of subjects:– The links of the first group are always visible.

– The links of the second group are visible upon clicking on a pop-up menu.

– The higher end of the conversion rates is observed for pages of the first group.

State of: Summer Term 2002

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 68

The concept hierarchy CH-2for the preferences of the visitors

The two groups of subjects in the site's navigation bar:Group 1: always visibleGroup 2: visible via the pop-up menu

State of: Summer Term 2002

Page 35: Data Mining for Business Applications Web Analysis for E ...infolab.cs.unipi.gr/pre-eclass/courses/db/db-post/slides/Spiliopoulou.pdfAccording to an old survey, the top reasons for

35

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 69

Conclusion

A Web site is an interaction channel between an institution and a user.

A site is intended to satisfy user needs to the benefit of the institution.

User satisfaction Web Usability

Institutional benefit Business Success Measures

Data mining contributes to the evaluation of Web sites with

site models and site evaluation criteria

algorithms for data preparation and mining

Data mining experts contribute to the evaluation of Web sites by

understanding the objectives of a Web site

modeling it and designing the evaluation criteria for it

doing data mining and interpreting the results

© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 70

Thank you very much !

Questions ?