1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike...
-
Upload
morgan-cole -
Category
Documents
-
view
217 -
download
0
Transcript of 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike...
![Page 1: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/1.jpg)
1
From User Access Patterns to Dynamic Hypertext Linking
Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips
A Research Directions In Computing Presentation
T. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal
![Page 2: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/2.jpg)
2
Agenda• Introduction• Some theory• The paper• A short critique• After the paper
– Academic research– The Authors’ work
• The technology in use today• Conclusion• Questions
![Page 3: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/3.jpg)
3
Introduction
HypothesisThat hyperlinks to unvisited and indirectly linked
pages can be offered based upon pages the user has already visited
Experimenta) to analyse log files to form clusters of
commonly co-accessed pagesb) to categorize online users into the correct
categories and offer appropriate links
![Page 4: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/4.jpg)
4
Mass customisation
• Concept of adapting things to each user – on a large scale
• Economic benefit in adding value• Satisfied shoppers also more likely to return• What’s new?
– In the physical world, customisation doesn’t scale.
– Using technology and intelligent algorithms, it can.
![Page 5: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/5.jpg)
5
Adaptive Web Sites
• Sites that automatically improve their organisation and presentation based on visitor access patterns
• We can cluster pages on a site together based on their co-occurrence frequency– Likelihood that user will visit page P having
visited Q• For a user browsing the site, use session
history to predict which pages a user may want to access – and so adapt site
![Page 6: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/6.jpg)
6
The Paper
• Yan et al. implement an adaptive web site, based on user access logs.
• Paper discusses different approaches to clustering and implementation
• Experimental data is presented– validating the concept of clustering on an
academic site– showing the value added by an adaptive website
using their technique
• The log analysis software used is published
![Page 7: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/7.jpg)
7
The paper - Justification
• Use the metaphor of a shopper browsing an online shop
• Adaptive site can provide links to similar items to those being browsed– eg “Male Yuppie” browsing executive toys– Might also be interested in sportswear
• As site grows, static links to ‘related’ content more of a challenge - dynamic is much better
• Many practical examples today – but not 10 years ago!
![Page 8: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/8.jpg)
8
Online
The Paper – System Design
Link Generator
HTML Documents
Offline
Access logs
Preprocess Cluster
User Categories
URL
HTML with suggestions
WebServer
End user
![Page 9: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/9.jpg)
9
The paper - Preprocessing
• For each user session– form a n-dimensional vector of the pages
visited– can weight vector elements using a metric
• Number of hits to page• Estimate of time spent on page (possibly
normalised)
• ‘Close’ session vectors in n-dimensional space form a cluster
![Page 10: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/10.jpg)
10
The paper - Clustering
• Different algorithms to cluster vectors by ‘closeness’
• Paper uses Leader algorithm – with additional constraints– Constraint: Minimum hits in a valid
session– Constraint: Minimum cluster size
• Algorithm fast and memory efficient– But not order invariant
![Page 11: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/11.jpg)
11
Dynamic Link Generation
• Use session history to track page a user has visited– Authors buffered logs in memory using a database– Sessions part of most web servers now
• Match partial vector of session with pre-calculated categories to build list of appropriate pages– Partial vector, so Euclidian distance not necessarily
appropriate– May be better to simply count matching categories
• Filter the suggestion list to remove pages visited - and possibly any already adjacent in navigation tree
![Page 12: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/12.jpg)
12
Paper – Experimental results
• Time spent on particular pages follows Zipfian distribution – not useful for page weight
• The authors present a number of experimental results about clustering algorithm parameters, e.g. min. cluster size
• Found clusters on academic website that were not evident from hypertext layout – so clustering serves purpose.
![Page 13: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/13.jpg)
13
Critique• Paper presents new concept of clustering web
accesses – but essentially draws together existing work in other fields
• Makes key simplifications– Ignores any web caching, proxies, etc– Considering all pages in a session as being in a
category is naïve – e.g. navigation pages, indexes, etc
• Weakness in experiments– Authors invented nominal ‘sessions’ based on
unique end-user addresses as server didn’t support sessions
– Only present data for one site• 2,709 sessions – of which 50% were in the same cluster!
![Page 14: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/14.jpg)
14
Further Work
• Garcia-Molina– Beyond Document Similarity: Understanding
Value-Based Search and Browsing Technologies (2000)
• Discusses judging value of web documents based on user behaviour
• Dayal:– Knowledge-Based Support Services: Monitoring
and Adaptation (2000)• Discusses a Knowledge-Based Service deployed within
HP to deliver customer support services.• System adapts based on observed user patterns and
evolving needs
![Page 15: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/15.jpg)
15
Related Work
• Web Prefetching (Jiang & Kleinrock, 1998)– Addresses slow access speeds of World Wide Web
• PREDICTION MODULE: Computes access probabilities.• THRESHOLD MODULE: Computes prefetch thresholds.
– Uses clustering to divide users into categories by access probability
• Restoring Meaningful Episodes in a Proxy Log (Lou et al. 2001)– Extracting user’s activity information from proxy
logs– Classifies individual requests into meaningful
semantic elements– Semantics-based CUT-AND-PICK approach
![Page 16: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/16.jpg)
16
Related Work
• SUGGEST (Baraglia et al. 2002, 2004)– No off-line component– Quality metric to estimate effectiveness of
suggestions
• Media Agents (Wenyin et al. 2003)– Automatic collection of semantic indices of
multimedia data– Semantic descriptions from content of documents– User’s interaction refines semantic indices and
suggests other multimedia data
![Page 17: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/17.jpg)
17
Custom application - Analog
Applications & The Paper
Uses clustering tech to analyse log files
To dynamically generate possibly interesting links
Means
End
Successful(to an extent)
![Page 18: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/18.jpg)
18
1996-2005 Technology Directions
Vivisimo
Google Labs
Clustering Documents
Amazon
Flickr
Tivo
Collaborative Filtering
![Page 19: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/19.jpg)
19
Amazon.com
• Uses recommendation algorithm– person who bought ‘x’ also bought ‘y’
• Item-to-item collaborative filtering– provides recommendations based on grouped
items, not customers
For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2
Ess
ence
![Page 20: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/20.jpg)
20
Amazon.com
• Creates vectors where each vector is an item with M dimensions (customers)
• Similarity between two items computed by measuring cosine of angle between two vectors.
• Offline computation theoretically expensive: O(N2M)
• In practice only O(NM) as most customers have few purchases.
![Page 21: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/21.jpg)
21
Conclusion
• The paper was on the right track
• Appreciated applicability of clustering to e-commerce
• Hypothesis proved by experiment
• Failed to address or even predict scalability issues
![Page 22: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/22.jpg)
22
References
• Author’s Work– Yan, T., Jacobsen, M., Garcia-Molina, H., Dayal, U., ‘From
User Access Patterns to Dynamic Hypertext Linking,’ In: Fifth International World Wide Web Conference, 1996 (Paris, France)
– Paepcke, A., Garcia-Molina, H., Rodriquez, G. and Cho, J., ‘Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies’, In: Stanford University Technical Report, 2000
– Delic, K. A. and Dayal, U., ‘Knowledge-Based Support Services: Monitoring and Adaptation,’ In: Proceedings of the 11th international Workshop on Database and Expert Systems Applications, IEEE Computer Society, 2000
![Page 23: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/23.jpg)
23
References
• Related Work– Baraglia, R., Silverstri, F., Palmerini, P., ‘On-line Generation
of Suggestions for Web Users’, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2004
– Baraglia, R., Palmerini, P., ‘A web usage mining system’, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2002
– Wenyin, L., Chen, Z., Lin, F., Zhang, H., Ma, W., ‘Ubiquitous Media Agents: A framework for managing personally accumulated multimedia files,’ 9th ACM international conference on multimedia, 2003 (Toronto, Canada)
– Jiang, Z., Kleinrock, L., ‘Web prefetching in a mobile environment’, IEEE Personal Communications 5(5): 25 – 34, October 1998
![Page 24: 1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.](https://reader035.fdocuments.net/reader035/viewer/2022062417/55160e3f55034694308b51e2/html5/thumbnails/24.jpg)
24
References
– Lou, W., Lu, H., Liu, G., Yiang, Q., ‘Restoring Meaningful Episodes in a Proxy Log’, 2001.
– Ungar, L., Foster, D., ‘Clustering Methods For Collaborative Filtering’, In: AAAI Workshop On Recommendation Systems, 1998.
– Linden, G., Smith, B., York, J., ‘Amazon.com Recommendations Item-to-Item Collaborative Filtering’, In: IEEE Internet Computing, Vo. 7, No. 1, Jan 2003.