Data Mining for Business Applications Web Analysis for E...
Transcript of Data Mining for Business Applications Web Analysis for E...
1
<Nov. 2005> 1© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg
Data Mining for Business Applications
Web Analysis for E-Business
Myra Spiliopoulouhttp://omen.cs.uni-magdeburg.de/itikmd
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 2
Who am I?
Myra Spiliopoulou
Study of Mathematics - University of Athens
PhD in Computer Science - University of Athens
Habilitation in "Wirtschaftsinformatik" - HU Berlin
Guest professor in the University of Magdeburg
Professor of E-Business in the Leipzig Graduate School of
Management
Professor of Business Informatics in the University of Magdeburg
Email: [email protected]
Homepage: http://omen.cs.uni-magdeburg.de/itikmd
Business Informatics
2
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 3
The research group KMD
KMD is the second Business Computer Science group in the Faculty of Computer Science, University of Magdeburg.
KMD is part of the Institute of Technical and Business Information Systems (ITI).
KMD has been established in February 2003.
KMD stands for:
KnowledgeManagement & Knowledge Discoveryin Business Information Systems
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 4
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
3
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 5
Literature & Further Readings
Web Usability:Jakob Nielsen. Designing Web Usability. New Riders Publishing, 2000
Web Case Study:Hajo Hippner, Ulrich Küsters, Matthias Meyer, Klaus Wilde (Hrsg.) Handbuch Web Mining im Marketing, Vieweg, 2002
– Wann werden Surfer zu Kunden? Navigationsanalyse zur Ermittlung des Konversionspotenzials verschiedener Bereiche einer Site, Bettina Berendt, Myra Spiliopoulou
For a more detailed discussion on Web mining technologies see:Tutorial “Web Usage Mining for E-Business Applications” at ECML/PKDD 2002 by Myra Spiliopoulou, Bamshad Mobasher and Bettina Berendt, Helsinki, Italy, Aug. 19, 2002http://ecmlpkdd.cs.helsinki.fi/pdf/berendt-2.pdf
For a dedicated discussion on Web Evaluation see:Tutorial “Evaluation in Web Mining” at ECML/PKDD 2004 by Myra Spiliopoulou, Bettina Berendt, Ernestina Menasalvas,Pisa, Italy, Sept. 20, 2004http://www.wiwi.hu-berlin.de/~berendt/evaluation04
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 6
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
4
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 7
Added Valueand Success
An institution operating a Web site should care to create value for its (prospective) users/customers:
First, the users need a motivation for visiting this site instead of any competitor site.
Second, what the site offers to them should be satisfactory for them.
Third, the site should motivate them to establish a long-term relationship with the institution.
The marketing terms for company sites are:– Conversion: The user becomes a customer.– Retention: The customer stays loyal.
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 8
The side-effects of poorly designed sites
According to an old survey, the top reasons for abandoning a Website were:
– Could not find the item: 56% (62%)
– Site disorganized or confusing: 54% (61%)
– Pages downloaded too slowly: 53% (60%)
It has been recognized that:
Web users are particularly impatient and insist on instant gratification.
Web users consider a poorly designed site as indicator of low credibility (Fogg et al, 2001).
[GVU, http://www.gvu.gatech.edu/user_surveys/survey-1998-10/]
5
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 9
Defining Web Usability
[Lee, 1999]
"Web usability is the efficient, effectiveand satisfying completion of a specified
task by any given Web user. Support of essential user tasks made possible by Web technology serves as the
benchmark of usability."
"Web usability is the efficient, effectiveand satisfying completion of a specified
task by any given Web user. Support of essential user tasks made possible by Web technology serves as the
benchmark of usability."
(Derived from: ISO/DIS 9241-11, Ergonomic Requirements for office work with visual display terminals (VDTs). Part 11: Guidance on usability)
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 10
Variables Affecting Usability
System VariablesSystem Variables
• Internet transmission speed
• Visual display device capabilities
• Capabilities and limitations of user input devices
• Internet transmission speed
• Visual display device capabilities
• Capabilities and limitations of user input devices
User Characteristics
User Characteristics
• Extent of computeruse and Web experience and knowledge
• Age and disability-related limitations in memory and vision
• Reading ability
• Motivational factors
• Extent of computeruse and Web experience and knowledge
• Age and disability-related limitations in memory and vision
• Reading ability
• Motivational factors
• Finding desired information by direct search or discovering new information by browsing
• Comprehending the information presented
• Specialized tasks specific to certain Web sites, e.g. the ordering and downloading of products
• Finding desired information by direct search or discovering new information by browsing
• Comprehending the information presented
• Specialized tasks specific to certain Web sites, e.g. the ordering and downloading of products
WebUsability
WebUsability
[Derived from: Lee, 1999]
6
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 11
Cognitive Framework:Reading a Web Page
The way users read a Web page has been studied extensively by means of eye tracking analyses;Faraday proposed a tool that traces the critique of a user towards a Web page by monitoring the user's face.
Two phases of reading a Web page have been distinguished:– Search phase: Viewer tries to find a salient entry point into the page
– Scanning phase: Extraction of information
[Faraday, 2000]
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 12
Reading a Web Page:The Search Phase
Motion: Most important variable is automatically detected, i.e. object in motion "pops out"
Size: Larger objects will be focused on in preference to smaller ones in order and duration
Images: will be attended to in preference to text
Color: Brighter elements dominate over darker
Text style: Typographical cues are nonverbal devices for attracting and focusing attention
Position: Reading text (non-textual pages) often begins in the top left area (in the center) of a page
[Faraday, 2000]
7
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 13
Reading a Web Page:The Scanning Phase
Area– Elements are grouped according to Figure and Ground
relationships, forming a "Gestalt"
– Grouping or placing elements in proximity provides information about their relationships
Proximity and Reading Order– Grouping follows a reading order, moving from left to right
and top to bottom
– This fundamental axis dominates most design decisions
[Faraday, 2000]
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 14
Reading a Web Page:Predicting User Perception
[Faraday, 2000]
8
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 15
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 16
Page Design:Rules of thumb
Web pages should be dominated by content of interest to the user
Rules of thumb:– Content should account for at least half a page’s design
– Navigation should be kept below 20 percent of the space for destination pages
Web pages should always work independently of the screen resolution
9
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 17
Page Design:Response Times
Response times <= 0.1 seconds: User feels that the system is reacting simultaneously
Response times > 1.0 seconds: User feels disturbed in navigating through information space and experiences an interrupted flow of thoughts
Response times > 10.0 seconds: Limit for keeping user’s attention focused on the dialogue is exceeded; user should be warned by estimating time
Slow response times often translate directly into a reduced level of trust
Due to the Internet’s architecture, it is virtually impossible to exactly control download times
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 18
Page Design:Linking
Links can be categorized into three main categories
Structural navigation links, e.g. home page buttons or links to a set of pages that are subordinate to the current one
Associative links within the page content, usually underlined words, pointing to pages with further information about the anchor text
See Also lists of additional references, helping users to find what they want if the current page isn’t the right one
10
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 19
Page Design:Linking
Links give the user’s eye something to rest on while scanning through an article, similar to call-outs in print media. As such, they should not exceed two to four words.
Link text should always provide information of what to expect behind the link.
If necessary, the text surrounding a link should provide sufficient additional information for the user to assess whether the page behind the link is worth loading.
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 20
Content Design:Writing for the Web
Users look first at the page’s main content area and scan it for headlines and other indicators of what the page is about
Reading from computer screens is about 25% slower than reading from paper
Write concisely and succinctly and write less than 50 percent of the text you would have used to cover the same material in print publication
Write for scannability
Use short paragraphs, subheadings and bulleted lists. Skimming instead of reading is a fact on the Web.
11
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 21
Site Design:The Home Page
The first immediate goal of any home page is to answer the first-time users’ questions "Where am I?" and "What does this site do?"
Experienced visitors use the home page as entry point to the site’s navigation scheme
A home page should offer three features:– A directory of the site’s main content areas (navigation)
– A summary of most important news or promotions
– A search feature
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 22
Site Design:Navigation
Users should always be able to determine their location– Relative to the Web as a whole
– Relative to the site’s structure
Navigation interfaces should not differ drastically from the majority of Web sites
Include a logo or other site identifier consistently on every page
Location relative to the site’s structure is usually given by partly showing the site's structure and highlighting the area of the current page
12
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 23
Site Navigation:Search Capabilities
About 50% of Web users are search dominant, approx. 20% are link dominant and the rest exhibits mixed behavior
Search should be easily available from every single page on the site
– Site structure and navigation still provide important clues about the search results location in the information space to the user
– Usability studies show that users are often unable to use boolean or advanced search features correctly
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 24
The Web (sub)site of theThomaskirche-Bach 2000 club
http://www.thomaskirche-bach.de/deutsch/vereinframe.htmAccess at 20.02.2002
State of: Summer Term 2002
13
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 25
Additional Reading and References
Butler, Keith A., Jacob, Robert J.K. and John, Bonnie E. Human Computer Interaction: Introduction and Overview, Proc. of SIGCHI’98
Card, Sutart K., Pirolli, Peter, Van Der Wege, Mija, Morrison, Julie B., Reeder, Robert W., Schreáedley, Pamela K. and Bhoshart, Jenea (2001). Information Scent as a Driver of Web Behavior Graphs: Results of a Protocol Analysis Method for Web Usability, Proc. of SIGCHI‘01, Seattle, WA, USA, Mar. 31-Apr. 4
Faraday, Pete (2000). Visually Critiquing Web Pages, Proc. of the 6th Conference on Human Factors & the Web, Austin, Texas
Fogg, B.J,, Marshall, J., Laraki, O., Osipovich, A., Varma, C., Fang, N., Paul, J.,Rangnekar, A., Shon, J., Swani, P. and Treinen, M. (2001). What Makes Web Sites Credible? A Report on a Large Quantitative Study, Proc. of SIGCHI 2001, Seattle, WA, USA, Mar. 31-Apr. 4
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 26
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
14
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 27
The notion of"good" Web site
The objective of a Web site is NOT
the maximisation of the number of visitors accessing it
the prolongation of the visitors' stay time
the inspection of a maximum number of pages/items/products
the satisfaction of the visitors
In general, the (abstract) objective of a Web site is
the contribution to the business objectives of its owner
with respect to the target groups accessing it
in a cost-effective way.
The "success" of a Web site is a measure of the degree, in which the site satisfies its objective.
... but this often a prerequisite for site success.
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 28
Business goals of a site (I)
1. Sale of products/services on-line
Amazon sells books (etc) online.The site should help the users find the most suitable books for their needs, identify more related products of interest and, finally purchase them in a secure and intuitive way.
Personalisation
Cross/Up-SellingSite design
2. Marketing for products/services to be acquired off-lineInsurances, banks, application service providers etc: providers of services based on a long-term relationship with the customer do not sell on-line to unknown users.The site should demonstrate to the users the quality of the product/service and the trustworthiness of its owner and initiate an off-line contact.
15
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 29
Business goals of a site (II)
3. Reduction of internal costsSome banks offer online banking. Some insurances support case registration online. This reduces the need for human-preprocessingand the likelihood of typing errors.The site should help the users locate and fill the right forms and submit them in a secure and intuitive way.
4. Information disseminationGoogle, IMDB etc offer information by means of a search engine over a voluminous archive of high quality data.The site should help the users find what they search for, ensurethem upon the quality (precision and completeness) of the information provided, and also motivate them to access the products/services of the sponsors.
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 30
The Web Site of the Thomaskirche in Leipzig:Description and Objectives (1)
The Thomaskirche in Leipzig is the church in which Johann Sebastian Bach has worked during his most creative years as a composer.
The Web-Site of the church operates as
contact point for the religious community
source of information about the concerts in the church
source of information on the church building itself, which is subject to restoration
entry point for the club Thomaskirche-Bach 2000
entry point for the e-shop ThomasShop
The Web-Site has two main entry pages:– www.thomaskirche.org
– www.thomaskirche-bach2000.de
State of: Summer Term 2002
16
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 31
The Web Site of the Thomaskirche in Leipzig:Description and Objectives (2)
The target groups of the Web site are:
the members of the religious community– located in Leipzig
– majority speaks German
the members of the club Thomaskirche-Bach 2000
visitors of the Thomaskirche, including tourists– located all over the world
– not necessarily speaking German
persons interested in Bach and his music, in the Thomaskirche and the works of art in it, in Leipzig, in the charitative activities of the church, in the concerts taking place in the church
– located all over the world
– not necessarily speaking German
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 32
The Web Site of the Thomaskirche in Leipzig:Description and Objectives (3)
The Web site has several objectives towards its target groups.
The case study focussed on only one of those objectives:
the contribution of the site to the income of the Thomaskirche
through– donations
– purchases in the ThomasShop
The club Thomaskirche-Bach 2000 plays a key role in the acquisition of donations:
Its objective is the maintenance of the Thomaskirche.
Its members are potential donors.
Most information/links concerning donations are in its subsite.
State of: Summer Term 2002
17
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 33
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 34
A process overviewfor sales of products/services
The interaction of the potential customer with the company goes through three phases:
InformationAcquisition
InformationAcquisition
Negotiation&
Transaction
Negotiation&
Transaction
AfterSales
Support
AfterSales
Support
The ratio of persons going from one phase to the next is the basis for a set of positive and negative measures:
ContactConversion
Retention
AbandonmentAttrition Churn
18
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 35
From the visitor to the loyal customer:The model of Berthon et al [BPW96]
Early realisation of the marketing measures for Web sites [BPW96]:
Conversion efficiency := Customers / Active investigators
Retention efficiency := Loyal Customers / Customers
whereby:Active investigators are visitors that stay long in the site.Customers are visitors that buy something.Loyal customers are customers that come to buy again.
Short-time visitorsSi
te u
sers
Active InvestigatorsCustomers
Loyal Customers
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 36
From the visitor to the loyal customer:The micro-conversion rates of Lee et al [LPSH01]
The model of Lee et al [LPSH01] distinguishes among four steps until the purchase of a product:
Product impression
Click through
Basket placement
Product purchase
and introduces micro-conversion rates for them:
look-to-click rate: click throughs / product impressions
click-to-basket rate: basket placements / click throughs
basket-to-buy rate: product purchases / basket placements
look-to-buy rate: product purchases / product impressions
A session is a set of click operations performed during one visit.Clicks leading to product impressionsand those corresponding to basket placements and purchases are uniquely identified as such.
19
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 37
The e-metrics modelof Cutler & Sterne [CS00]
The e-metrics model of [CS00] encompasses:
Site-centric measures for regions of a site, including:
–
– Slipperiness := Stickiness
–
"Desirable value ranges" for each measure, depending on the purpose/objective of the region:
– A region used during information acquisition should be sticky.– The pages accessed during the negotiation and transaction phase
should be slippery.
Total time spent in the region
Number of visitors in the regionStickiness:=
Avg num of visited pages in the region
Number of pages in the regionFocus:=
How is a region defined ?
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 38
A Site is NOT for all users:Target Groups and User Segmentation
Truisms:
A site owner does not welcome all users equally.
A site cannot satisfy all users accessing it.
Hence, sites
are designed for some types of users
serve different user types to different degrees
User types are the result of:
User segmentation according to criteria of the site owner
User segmentation on the basis of personal characteristics
User segmentation with respect to recorded behaviour
20
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 39
For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue
For each visitor:(1) assign her to the right segment asap(2) motivate her to move to a segment of higher revenue
User SegmentationIn Predefined Business Segments
A company may partition its customers on the basis of
the revenue it obtains or expects from themthe (cost of) services it must offer them to obtain the revenue
There are different segmentation schemes, based on– the characteristics of the customers– the company portfolio
and producing a set of predefined classes.For a Web application this means:
2. Association rules forcross selling & up selling
3. Recommendations &Personalisation
1. Classification
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 40
For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment
For each visitor:(1) assign her to the right segment asap(2) make suggestions based on the contents of the segment
User SegmentationIn Unknown Segments
Web site visitors can be grouped on the basis of their interests, characteristics and navigational behaviour without assuming predefined groups.
There is much research on user groupingbased on
– the properties and contents of the objects being visited– the declared or otherwise known characteristics of the visitor– (the order of the requests)
For a Web application this means:
2. Recommendations &Personalisation
1. Clustering
21
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 41
User Segmentationon navigational behaviour
Web site visitors exhibit different types of navigational behaviour.
Model I (simplistic):Some users navigate across links. Others prefer a search engine.
Model II [FGL+00]:
based on criteria like active time spent on-line and per page, pages and domains accessed etc.
Model III [Moe] for merchandising sites:
based on criteria like purchase intention, time spent on the site,number of searches initiated, types of pages visited etc.
SimplifiersSimplifiers SurfersSurfers BargainersBargainersConnectorsConnectors RoutinersRoutiners SportstersSportsters
DirectbuyingDirectbuying
Knowledgebuilding
Knowledgebuilding
Search/Deliberation
Search/Deliberation
HedonicbrowsingHedonicbrowsing
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 42
Data Mining for Web Site Evaluation
Web Mining encompasses models, algorithms and evaluation criteria for the analysis of web sites:
Evaluation criteria with respect to business success
Models:– A site may be observed as a collection of pages, a graph of pages, an agent
interacting with the user, …
– A page may be observed as a document, a hypertext document, a collection of links, a node in a graph, …
– The interaction with the web site may be observed as a set of clicks, a set of queries, a sequence of clicks/queries, a sequence of application-specific actions, …
– External information about users, contents, products, actions may be incorporated from the data warehouse or the corporate ontology
Data mining algorithms for: classification, clustering, assoc. rules, …
Data preparation algorithms
22
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 43
Literature and Further Readings (1)
[BPW96] P. Berthon, L.F. Pitt and R.T. Watson. The World Wide Web as an advertising medium. Journal of Advertising Research, 36(1), pp. 43-54, 1996.
[Ber02] Berendt, B. (2002). Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery, 6, 37-59.
[BS00] Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75.
[CPCP01] Chi, E.H., Pirolli, P., Pitkow, J.E. (2000). The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site. In Proceedings CHI 2000 (pp. 161-168).
[CS00] M. Cutler and J. Sterne. E-metrics — Business metrics for the new economy. Technical report, NetGenesis Corp., http://www.netgen.com/emetrics (access date: July 22, 2001)
[DFAB98] Alan Dix, Janet Finlay, Gregory Abowd, Russell Beale. Human Computer Interaction. Prentice Hall Europe 1998. Cited after http://www.tau-web.de/hci/space/i7.html and http://www.tau-web.de/hci/space/x12.html.
[DZ97] X. Dreze and F. Zufryden. Testing web site design and promotional content. Journal of Advertising Research,37(2), pp. 77-91, 1997.
© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 44
Literature and Further Readings (2)
[FGL+00] J. Forsyth and T. McGuire and J. Lavoie. All visitors are not created equal. McKinsey marketing practice. McKinsey & Company. Whitepaper. 2000.
[Flem98] Fleming, J. (1998). Web Navigation. Designing the User Experience. Sebastopol, CA: O'Reilly.
[HSB02] K.-P. Huber, F. Säuberlich, C. Böhm. Kennzahlenbasiertes Web Controlling miteiner Web Scorecard. In "Handbuch Web Mining im Marketing" (eds. H. Hippner, M. Merzenich, K. Wilde). vieweg. 2002 (on German)
[KNY00] Kato, H., Nakayama, T., & Yamane, Y. (2000). Navigation analysis tool based on the correlation between contents distribution and access patterns. In Working Notes of the Workshop "Web Mining for E-Commerce - Challenges and Opportunities." at SIGKDD-2000. Boston, MA (pp. 95-104).
[KP92] R.S. Kaplan, D.P. Norton. The Balanced Scorecard: Translating Strategy to Action. Boston MA. 1992
[KP03] Kohavi, R. and Parekh, R. Ten Supplementary Analyses to Improve E-Commerce Web Sites. In Proceedings of the WebKDD 2003 Workshop - Webmining asa Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA (pp. 29-36).
© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004
23
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 45
Literature and Further Readings (3)
[KB04] A. Kralisch und B. Berendt. Cultural determinants of search behaviour onwebsites. In V. Evers, E. del Galdo, D. Cyr & C. Bonanni (Eds.), Designing for GlobalMarkets 6. Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation. Vancouver, Canada, 8 - 10 July, 2004. Vancouver, BC: Product & SystemsInternationalisation, Inc., pp. 61-74, 2004.
[Kuhl96] R. Kuhlen. Informationsmarkt: Chancen und Risiken der Kommerzialisierungvon Wissen. 2nd edition, 1996 (on German)
[LPS+00] Junghoung Lee, M. Podlaseck, E. Schonberg, R. Hoch and S. Gomory. Analysis and visualization of metrics for online merchandizing. In "Advances in Web Usage Mining and User Profiling: Proc. of the WEBKDD'99 Workshop", LNAI 1836, Springer Verlag, pp. 123-138, 2000.
[Moe] W. Moe. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. In Journal of Consumer Psychology.
[Niel00] Nielsen, J. (2000). Designing Web Usability: The Practice of Simplicity. New Riders Publishing.
© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 46
Literature and Further Readings (4)
[SF99] M. Spiliopoulou, L.C. Faulstich. WUM: A Tool for Web Utilization Analysis. In: Extended version of Proc. EDBT Workshop WebDB’98, LNCS 1590. Springer Verlag, Berlin Heidelberg New York, pp 184–203, 1999.
[Shne98] Shneiderman, B. (1998). Designing User Interface. Strategies for Effective Human-Computer Interaction. 3rd edition. Reading, MA: Addison-Wesley.
[Ste03] Sterne, J. WebKDD in the Business World. Invited talk in the WebKDD 2003 Workshop - Webmining as a Premise to Effective and Intelligent Web Applications. at SIGKDD-2003. August 27th, 2003, Washington DC, USA.
[Spi99] M. Spiliopoulou. The laborious way from data mining to Web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on ”Semantics of the Web”, 14, pp. 113–126, 1999.
[SP01] M. Spiliopoulou,C.Pohle. Data mining for measuring and improving the success of Web sites. In Journal of Data Mining and Knowledge Discovery, Special Issue on E-commerce, 5, pp. 85–114. Kluwer Academic Publishers. 2001
[Sul97] T. Sullivan. Reading reader reaction: A proposal for inferential analysis of web server log files. Proc. of the Web Conference'97, 1997.
© Myra Spiliopoulou, Bettina Berendt, Ernestina MenasalvasECML/PKDD 2004
24
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 47
Thank you very much !There is more ...
Questions thus far ?
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 48
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
25
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 49
The Web-Server Log Dataof the Thomaskirche
Data of the analysis:
Web-server log of www.thomaskirche-bach2000.de – Extended-Log-Format
– No cookies
for a period of several months
Data preparation:
1. Data cleaning
2a. Mapping of activities to users
2b. Session reconstruction
3. Mapping page invocations into application-specific concepts
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 50
The Web-Server Log Dataof the Thomaskirche
1. Data cleaning:Elimination of all invocations of scripts, pictures, navigation bars and similar artefacts appearing on each page
Approximately one log record per pageElimination of irrelevant records (robots, administrator)
2. Sessionisation:One IP address assumed as one user
– Low number of accesses per day– Low number of simultaneous accesses
All accesses from the same IP address during the same period assigned to the same userNo recognition of recurring users
One session := One visit
3. Mapping of page invocations to application-specific concepts
State of: Summer Term 2002
26
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 51
The concept hierarchy CH-1for the conversion of customers and donors
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 52
The concept hierarchy CH-2for the preferences of the visitors
State of: Summer Term 2002
27
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 53
Analysis:Objectives and tools
The goals of the analysis were:
Gaining insights on the conversion efficiency of the site
Identification of pages that need improvement
but NOT the discovery of a posteriori interesting patterns.
The goals were pursued through the formulation of concrete mining queries.
Tools:
SAS Enterprise Miner for the concepts in CH-1
WUM for the concepts in CH-1 und CH-2
whereby both tools lead to the same findings for CH-1.
State of: Summer Term 2002
Sequence miningSequence mining
Association rulesdiscovery
Association rulesdiscovery
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 54
Association rules discovery (1)
Input: A set T of transactions involving items from an itemset I
Output: Frequent groups of items and rules derived from these groups
Approach: For a given frequency threshold σ:
1. Find all items that appear in k transactions, with k/|T| >= σ .They constitute L1, the set of all frequent 1-item itemsets.
2. For i=2,...– Expand each itemset x in Li-1 by one item from I that is not already in x.
They constitute Ci.
– Remove from Ci all itemsets that do not appear in at least σ percent of the transactions.The remainder is Li, the set of all frequent i-item itemsets.
until Ci is empty.
3. For each frequent itemset x, generate rules of the form A -> B where: A and B partition x and their intersection is empty.
Simplification of the Apriori algorithm:Agrawal Rakesh, Imielinski T. and Swami Arun "Mining Association Rules Between Sets ofItems in Large Databases", Proc. of SIGMOD´93, 207-216, Washington DC, May 1993.
28
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 55
Association rules discovery (2):Example of a fictive, tiny dataset
A fictive dataset as a non-1NF relation:
CustomerIdCustomerId ItemsItems
1 chocolate, orange juice, toothpaste, rice
2 bananas, chocolate, toothpaste
3 milk, rice
4 bananas, chocolate
5 bananas, chocolate, orange juice
6 toothpaste, rice
7 chocolate, milk, rice
8 bananas, chocolate, rice
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 56
Association rules discovery (3):Items, itemsets and their frequency
in the Example
I = {bananas, chocolate, milk, orange juice, toothpaste, rice}
Some itemsets:
{bananas, chocolate}
{chocolate, milk, rice}
{orange juice}
{bananas}
{orange juice, milk, rice}
I
Their frequencies:
4/8
1/8
2/8
4/8
0/80/8
For σ = 0.5 :
FREQUENT
NOT FREQUENT
NOT FREQUENT
FREQUENT
NOT FREQUENTNOT FREQUENT
CustomerIdCustomerId ItemsItems
1 chocolate, orange juice, toothpaste, rice
2 bananas, chocolate, toothpaste
3 milk, rice
4 bananas, chocolate
5 bananas, chocolate, orange juice
6 toothpaste, rice
7 chocolate, milk, rice
8 bananas, chocolate, rice
29
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 57
Association rules discovery (4):Support, confidence and other statistical measures
Let A -> B be an association rule derived from an itemset x = A U B.
To what extend does the dataset support the rule ?
support(A->B) := support(x) := frequency of x in T
Given A, how confident are we that B will also appear ?
confidence(A->B) := support(A->B) / support(A)
Obviously it holds that:confidence(B->A) = support(A->B) / support(B)
Is the likelihood that B appears higher when A is given than in the whole population?
lift(A->B) := confidence(A->B) / support(B)
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 58
Sequence mining
Input: A set T of transactions, where a transaction is a sequence of items/events from an itemset I
Output: Subsequences of events that appear frequently in this dataset and order-preserving rules derived from these subsequences
Approach: Extensions of algorithms for association rules discovery by
1. order-respecting variations,
2. distinction between adjacent and non-adjacent events
or
Specialised algorithms, e.g. for– web-server log analysis– genom sequence analysis– discovery of richer structures like trees and graphs
30
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 59
Agenda
Designing a Good Web Site: Web Usability
Web Site Evaluation: Modeling & Measuring Success
Designing a Good Web Site: Business Success
Designing a Good Web Site: Heuristics
Web Site Evaluation: Preparing & Mining
Web Site Evaluation: Understanding & Acting
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 60
Conversion rateand ideal navigation paths
Question: Contribution of pages to the conversion efficiency
Approach:
The pages were grouped according to their conversion potential into the concepts of the hierarchy CH-1.
An ideal path (cf. [CS00]) was specified, leading from low potential towards higher potential pages, taking the connectivity of the pages into account.
The ideal path was juxtaposed to frequent paths involving the concepts of CH-1.
LowPotentialCustomerLowPotentialCustomer
MediumPotentialCustomerMediumPotentialCustomer
HighPotential-IHighPotential-I
HighPotential-IIHighPotential-II
CustomerCustomer
State of: Summer Term 2002
31
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 61
Ideal path vs frequent pathsfor customer conversion
Sequence mining revealed that:
The ideal path was not frequent, i.e. there were hardly sessionscontaining it as a whole.
Break point: MediumPotentialCustomer was invoked more than once in the observed sessions.
Interpretation: The visitors of the ThomasShop inspect more than one products before deciding to put one into the basket.
Approach: The frequencies of the subpaths of the ideal path were studied instead.
– [ LowPotentialCustomer MediumPotentialCustomer ]
– [ MediumPotentialCustomer HighPotential-I ]
– [ HighPotential-I HighPotential-II ]
>= 55%
LOW
LOW
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 62
Ideal path and frequent pathsfor donor conversion
The ideal path for donations was similar:
and was not frequent either.
Break point: MediumPotentialDonor was invoked more than once in the observed sessions.
Approach: The frequencies of the subpaths of the ideal path were studied instead.
– [ LowPotentialDonor MediumPotentialDonor ]– [ MediumPotentialDonor HighPotentialDonor ]
The cross conversion of potential donors to potential customers was also low:
– [ MediumPotentialDonor LowPotentialCustomer ]
indicating limited synergy effects among the regions of the site.
>= 55%<= 25%
LowPotentialDonorLowPotentialDonor MediumPotentialDonorMediumPotentialDonor HighPotentialDonorHighPotentialDonor
<= 20%
State of: Summer Term 2002
32
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 63
Preferences of the visitorsin the MediumPotential-area:
Donations
The low conversion rates from the MediumPotential-pages to the HighPotential-pages lead to a closer inspection of the former.
MediumPotentialDonorMediumPotentialDonorBachClubPotentialBachClubPotential
BachOrganBachOrgan
NewsNews
EventsEvents
DonationsDonations
MembershipMembership
DescriptionDescription
ContactContact
BachClubBachClub
InfoInfo
Fill-In FormFill-In Form
ConcertCardsConcertCards
~ 30%
20%...30%
~ 15%
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 64
Preferences of the visitorsin the MediumPotential-area:
ThomasShop (1)
The low conversion rates from the MediumPotential-pages to the HighPotential-pages lead to a closer inspection of the former.
The impact of each product (type) on the conversion efficiency cannot be assessed in a straightforward way.
MediumPotentialCustomer
MediumPotentialCustomer
AudioCDsAudioCDs
SouvenirsSouvenirs
BooksBooks
ClothesClothes
BeveragesBeverages
LeatherLeather
~ 50%
~ 30%
State of: Summer Term 2002
33
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 65
Preferences of the visitorsin the MediumPotential-area:
ThomasShop (2)
The impact of each product (type) on the conversion efficiency cannot be assessed in a straightforward way:
A product may stimulate purchases without being purchased itself.
Most subsessions inside the ThomasShop contained more than one product inspection.
We can just say that:
AudioCDs were most often the first choice of inspection in the ThomasShop.
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 66
The findings,closely observed (1)
The conversion efficiency of the site varies among the page groups:
1. The LowPotential-pages show a conversion rate of more than 50% towards MediumPotential-pages.
2. The MediumPotential-pages, which lead to the forms for purchase and donations and should motivate the accomplishment of these activities have low conversion rate.
3. The HighPotential-pages have a relatively higher conversion rate but should be improved nonetheless: They allow the attrition of customers/donors just before the completion of the transaction.
State of: Summer Term 2002
34
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 67
The findings,closely observed (2)
Example: The sequence template [ BachClubPotential Y ]
The conversion rates vary on Y between 5% and 30%.
Y stands for pages that can be reached from the same vertical navigation bar.
whereby:
The links in the navigation bar build two groups of subjects:– The links of the first group are always visible.
– The links of the second group are visible upon clicking on a pop-up menu.
– The higher end of the conversion rates is observed for pages of the first group.
State of: Summer Term 2002
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 68
The concept hierarchy CH-2for the preferences of the visitors
The two groups of subjects in the site's navigation bar:Group 1: always visibleGroup 2: visible via the pop-up menu
State of: Summer Term 2002
35
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 69
Conclusion
A Web site is an interaction channel between an institution and a user.
A site is intended to satisfy user needs to the benefit of the institution.
User satisfaction Web Usability
Institutional benefit Business Success Measures
Data mining contributes to the evaluation of Web sites with
site models and site evaluation criteria
algorithms for data preparation and mining
Data mining experts contribute to the evaluation of Web sites by
understanding the objectives of a Web site
modeling it and designing the evaluation criteria for it
doing data mining and interpreting the results
© Myra Spiliopoulou, Otto-von-Guericke-Univ. Magdeburg 70
Thank you very much !
Questions ?