Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data
description
Transcript of Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data
![Page 1: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/1.jpg)
Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data
Srivastava J., Cooley R., Deshpande M, Tan P.N.
Appeared in SIGKDD Explorations, Vol. 1, Issue 2, 2000
![Page 2: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/2.jpg)
Web Mining What is?
Data Mining efforts associated with the Web
What kind of? Content Mining Structure Mining Usage Mining
![Page 3: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/3.jpg)
Web Data Content
Ex) texts and graphics Structure
Ex) HTML tags Usage
Ex) IP address, page reference, date/time
User profile Ex) registration data, customer profile
![Page 4: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/4.jpg)
Web Usage Mining The application of data mining
techniques to discover usage patterns from Web Data.
Three phrases Preprocessing Pattern discovery Pattern analysis
![Page 5: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/5.jpg)
Data Sources
Where the usage data can be collected from?
Server Level Collections The web server log records the
browsing behavior of site visitors, but cached page views are not recorded.
The packet sniffing extracts usage data directly from TCP/IP packets.
![Page 6: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/6.jpg)
Data Sources (contd.)<Sample Web Server Log>
# IP Address Userid Time Method/ URL/ Protocol Status Size Referrer Agent
1 123.456.78.9 - [25/Apr/1998:03:04:41 -0500] "GET A.html HTTP/1.0" 200 3290 - Mozilla/3.04 (Win95, I)2 123.456.78.9 - [25/Apr/1998:03:05:34 -0500] "GET B.html HTTP/1.0" 200 2050 A.html Mozilla/3.04 (Win95, I)3 123.456.78.9 - [25/Apr/1998:03:05:39 -0500] "GET L.html HTTP/1.0" 200 4130 - Mozilla/3.04 (Win95, I)4 123.456.78.9 - [25/Apr/1998:03:06:02 -0500] "GET F.html HTTP/1.0" 200 5096 B.html Mozilla/3.04 (Win95, I)5 123.456.78.9 - [25/Apr/1998:03:06:58 -0500] "GET A.html HTTP/1.0" 200 3290 - Mozilla/3.01 (X11, I, IRIX6.2, IP22)6 123.456.78.9 - [25/Apr/1998:03:07:42 -0500] "GET B.html HTTP/1.0" 200 2050 A.html Mozilla/3.01 (X11, I, IRIX6.2, IP22)7 123.456.78.9 - [25/Apr/1998:03:07:55 -0500] "GET R.html HTTP/1.0" 200 8140 L.html Mozilla/3.04 (Win95, I)8 123.456.78.9 - [25/Apr/1998:03:09:50 -0500] "GET C.html HTTP/1.0" 200 1820 A.html Mozilla/3.01 (X11, I, IRIX6.2, IP22)9 123.456.78.9 - [25/Apr/1998:03:10:02 -0500] "GET O.html HTTP/1.0" 200 2270 F.html Mozilla/3.04 (Win95, I)10 123.456.78.9 - [25/Apr/1998:03:10:45 -0500] "GET J.html HTTP/1.0" 200 9430 C.html Mozilla/3.01 (X11, I, IRIX6.2, IP22)11 123.456.78.9 - [25/Apr/1998:03:12:23 -0500] "GET G.html HTTP/1.0" 200 7220 B.html Mozilla/3.04 (Win95, I)12 209.456.78.2 - [25/Apr/1998:05:05:22 -0500] "GET A.html HTTP/1.0" 200 3290 - Mozilla/3.04 (Win95, I)13 209.456.78.3 - [25/Apr/1998:05:06:03 -0500] "GET D.html HTTP/1.0" 200 1680 A.html Mozilla/3.04 (Win95, I)
![Page 7: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/7.jpg)
Data Sources (contd.) Client Level Collections
By using remote agentsex) java applet (overhead), java script (not able to capture all user clicks)
By modifying the source code of existing browser ex) Mosaic (hard to convince users to use browser)
![Page 8: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/8.jpg)
Data Sources (contd.) Proxy Level Collections
Intermediate level of caching between web server and client browser.
Characterize the browsing behavior of a group of users sharing a common proxy server.
![Page 9: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/9.jpg)
Data Abstractions User : a single individual that is accessing file from one
or more Web servers through a browser Page Views : every file displayed on user’s browser
at one time Click Stream : a sequential series of page view
requests User Session : the click stream of page views for a
single user across the entire Web Server Session : the set of page views in a user
session for a particular Web site Episode : any semantically meaningful subset of a
user or server session
![Page 10: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/10.jpg)
Web Usage Mining Process
![Page 11: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/11.jpg)
Preprocessing• Usage Processing
The most difficult task due to the incompleteness of the available data (IP address, agent, server side click stream) Single IP address/Multiple Server Sessions Multiple IP address/Single Server Session Multiple IP address/Single User Multiple Agent/Single User
![Page 12: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/12.jpg)
Preprocessing(contd.) Content Preprocessing
Converting the text, image, scripts into useful forms (ex. vectors of words)
Classification/clustering algorithm can be used to filter discovered patterns based on topic or intended use
Structure Preprocessing Hyperlinks between page views
![Page 13: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/13.jpg)
Pattern Discovery Statistical Analysis
Page views, viewing time, length of navigational path
Association Rules Apriori algorithm: correlation between users
Clustering Usage clustering : inferring user demographics Page clustering: pages having related content
![Page 14: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/14.jpg)
Pattern Discovery (contd.) Classification
30% of users who placed an online order in /Product/Music are in the 18-25 age group and live on the West Coast.
Sequential Patterns Time-ordered set of sessions:
predicting future visit patters for where to put advertisement
![Page 15: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/15.jpg)
Pattern Analysis Motivation
Filter out uninteresting rules / patterns from the set found in the pattern discovery phrase.
![Page 16: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/16.jpg)
Application Areas
![Page 17: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data](https://reader031.fdocuments.net/reader031/viewer/2022020208/56813b02550346895da3a377/html5/thumbnails/17.jpg)
Examples Personalization
http://aztec.cs.depaul.edu/scripts/ACR2/
Business http://www.accrue.com/