Web Mining - Sharifce.sharif.edu/courses/85-86/1/ce925/resources/root/class... · December 24, 2006...
Transcript of Web Mining - Sharifce.sharif.edu/courses/85-86/1/ce925/resources/root/class... · December 24, 2006...
Web Mining
Kyumars Sheykh Esmaili
Data Mining CourseSharif University of Technology
Fall 2006
December 24, 2006 Web Mining 2
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 3
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 4
Introduction
Information Overloading on the webSize
2001New information created: 6 exabytes (10^18 bytes) 10 billion (nonspam) e-mail messages were sent per day.
2002New information created: 12 exabytes (10^18 bytes)
2003the public Internet contained about 1 trillion pages and was increasing at a rate of approximately 8 million pages per day.
200535 billion messages per day by 2005.
December 24, 2006 Web Mining 5
Challenges on WWW Interactions
Finding Relevant InformationCreating knowledge from Information availablePersonalization of the informationLearning about customers / individual users
Web Mining can play an important Role!
December 24, 2006 Web Mining 6
Introduction
Web mining - data mining techniques to automatically discover and extract information from Web documents/servicesWeb mining research – integrate research from several research communities :
Database (DB) Information retrieval (IR) The sub-areas of machine learning (ML) Natural language processing (NLP)
December 24, 2006 Web Mining 7
Web Data
Web pagesIntra-page structuresInter-page structuresUsage dataSupplemental data
ProfilesRegistration informationCookies
December 24, 2006 Web Mining 8
Web Data Categories
Web Data
Content Data
Structure Data
Usage Data
User Profile Data
Free Texts
HTML Files
XML Files
Dynamic Content
Multimedia
Static Link
Dynamic Link
December 24, 2006 Web Mining 9
Web Mining
Web StructureMining
Web ContentMining
Web C-SMining
Web UsageMining
Web Mining Taxonomy
December 24, 2006 Web Mining 10
Web Mining : SubtasksResource Finding
Task of retrieving intended web-documents
Information Selection & Pre-processingAutomatic selection and pre-processing specific information from retrieved web resources
GeneralizationAutomatic Discovery of patterns in web sites
AnalysisValidation and / or interpretation of mined patterns
December 24, 2006 Web Mining 11
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 12
Feature Selection for Web Mining
for the purposes of automated text classification text features should be:
Relatively few in numberModerate in frequency of assignmentLow in redundancyLow in noiseRelated in semantic scope to the classes
to be assignedRelatively unambiguous in meaning
December 24, 2006 Web Mining 13
Feature Selection
Potential features:BODYMETATITLESnippet
Means sentences attached with URL u appeared in search results
Anchor WindowThe anchor text and text around the hyperlink v->u in the
source page vMT, the union of META and TITLE content;BMT, the union of BODY, META and TITLE content.
December 24, 2006 Web Mining 14
Percentage of Web Pages With Words in HTML Tags
Feature Selection for Content Mining
December 24, 2006 Web Mining 15
Feature Selection For Web Pages
Classification performance for various representations of web pages
December 24, 2006 Web Mining 16
Vector Space Model for Content-Similarity
IR systems usually adopt index terms to process queriesIndex term:
a keyword or group of selected wordsany word (more general)
Stemming might be used:connect: connecting, connection, connections
An inverted file is built for the chosen index terms
December 24, 2006 Web Mining 17
Vector Space Model - Basic Concepts
Ki is an index termdj is a documentt is the number of index termsK = (k1, k2, …, kt) is the set of all index termswij >= 0 is a weight associated with (ki,dj)wij = 0 indicates that term does not belong to docvec(dj) = (w1j, w2j, …, wtj) is a weighted vector associated with the document djgi(vec(dj)) = wij is a function which returns the weight associated with pair (ki,dj)
December 24, 2006 Web Mining 18
The Vector Space Model
Sim(dk,dj) = cos(Θ) = [vec(dk) • vec(dj)] / |dk| * |dj| = [Σ wik * wij] / |dk| * |dj|Since wij > 0 and wik > 0, 0 <= sim(dk,dj) <=1
A document is retrieved even if it matches the target document terms only partially
i
j
dj
dkΘ
December 24, 2006 Web Mining 19
The Vector Space Model: Example
d1
d2
d3d4 d5
d6d7
k1k2
k3
k1 k2 k3 q • dj |dj| Sim(dj,q)d1 1 0 1 2 1.41 0.82d2 1 0 0 1 1 0.58d3 0 1 1 2 1.41 0.82d4 1 0 0 1 1 0.58d5 1 1 1 3 1.73 1d6 1 1 0 2 1.41 0.82d7 0 1 0 1 1 0.58
q 1 1 1 |q| 1.73
December 24, 2006 Web Mining 20
The Vector Space Model - Weighting
Sim(q,dj) = [Σ wij * wiq] / |dj| * |q|How to compute the weights wij and wiq ?A good weight must take into account two effects:
quantification of intra-document contents (similarity)tf factor, the term frequency within a document
quantification of inter-documents separation (dissi-milarity)idf factor, the inverse document frequency
wij = tf(i,j) * idf(i)
December 24, 2006 Web Mining 21
Example:• A collection includes 10,000 documents• The term A appears 20 times in a particular document• The maximum apperance of any term in this document is 50• The term A appears in 2,000 of the collection documents.• f(i,j) = freq(i,j) / max(freq(l,j)) = 20/50 = 0.4• idf(i) = log(N/ni) = log (10,000/2,000) = log(5) = 2.32• wij = f(i,j) * log(N/ni) = 0.4 * 2.32 = 0.93
The Vector Model - Weighting
December 24, 2006 Web Mining 22
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 23
Social network analysis
Social network is the study of social entities (people in an organization, called actors), and their interactions and relationships. The interactions and relationships can be represented with a network or graph,
each vertex (or node) represents an actor and each link represents a relationship.
From the network, we can study the properties of its structure, and the role, position and prestige of each social actor. We can also find various kinds of sub-graphs, e.g., communities formed by groups of actors.
December 24, 2006 Web Mining 24
Social network and the Web
Social network analysis is useful for the Web because the Web is essentially a virtual society, and thus a virtual social network,
Each page: a social actor and each hyperlink: a relationship.
Many results from social network can be adapted and extended for use in the Web context.
December 24, 2006 Web Mining 25
Web Structure MiningThe Web consists not only of pages, but also of hyperlinks pointing from one page to another
These hyperlinks contain an enormous amount of latent human annotation
Assumption: link from page A to page B is a recommendation of page B by AIf A and B are connected by a link, there is a higher probability that they are on the same topic
December 24, 2006 Web Mining 26
Web Link Analysis
Used for Ordering documents matching a user query: rankingDeciding what pages to add to a collection: crawlingPage categorizationFinding related pagesFinding duplicated web sites
December 24, 2006 Web Mining 27
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 28
Structural Similarity MeasuresWe must define the similarity of two nodes
Method I:For page and page B, A is related to B if there is a hyper-link from A to B, or from B to A
Not so good. Consider the home page of IBM and Microsoft.
Page A
Page B
December 24, 2006 Web Mining 29
Structural Similarity Measures
Method II (from Bibliometrics)Co-citation: the similarity of A and B is measured by the number of pages cite both A and B
Bibliographic coupling: the similarity of A and B is measured by the number of pages cited by both A and B.
Page A Page B
Page A Page B
December 24, 2006 Web Mining 30
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 31
Using link structure of web (cont.)
There are two famous Link-Structure based algorithms for ranking :
PageRankHITS
Nearly All other algorithms are base on these ones :
Salsa,Clever,.
December 24, 2006 Web Mining 32
PageRank
Introduced by Page et al (1998)An offline algorithm (Query independent)The weight is assigned by the rank of parents
December 24, 2006 Web Mining 33
A Practical Example for PageRank
December 24, 2006 Web Mining 34
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 35
What is cyber-communityA community on the web is a group of web pages sharing a common interest
Eg. A group of web pages talking about POP MusicEg. A group of web pages interested in data-mining
Main properties: Pages in the same community should be similar to each other in contentsThe pages in one community should differ from the pages in another community Similar to cluster
December 24, 2006 Web Mining 36
Cyber Communities
December 24, 2006 Web Mining 37
Two different types of communities
Explicitly-defined communitiesThey are well known ones, such as the resource listed by Yahoo!
Implicitly-defined communitiesThey are communities unexpected or invisible to most users
Arts
Music
Classic Pop
Painting
eg.
eg. The group of web pages interested in a particular singer
December 24, 2006 Web Mining 38
Different types of communities
The explicit communities are easy to identifyEg. Yahoo!, InfoSeek, Clever System
In order to extract the implicit communities, we need analyze the web-graph objectively
In research, people are more interested in the implicit communities
December 24, 2006 Web Mining 39
Methods of clustering
Clustering methods based on co-citation analysis
Methods derived from HITS (Kleinberg)Using co-citation matrix
CT Method
December 24, 2006 Web Mining 40
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 41
HITS: Hubs and Authority
Hub: web page links to a collection of prominent sites on a common topicAuthority: Pages that link to a collection of authoritative pages on a broad topic; web page pointed to by hubsMutual Reinforcing Relationship: a good authority is a page that is pointed to by many good hubs, while a good hub is a page that points to many good authorities
December 24, 2006 Web Mining 42
Authority and Hubness
2
3
4
1 1
5
6
7
x(1) = y(2) + y(3) + y(4) y(1) = x(5) + x(6) + xs(7)
December 24, 2006 Web Mining 43
HITS Steps (1)
Creating root and base sets
December 24, 2006 Web Mining 44
HITS Steps (2)
Calculating Weights
Authority weight :
Hub weight :
Matrix notation: A - adjacency matrixA(i, j) = 1 if i-th page points to j-th page
December 24, 2006 Web Mining 45
Final Result of HITS
December 24, 2006 Web Mining 46
HITS Results – 3D perspective
December 24, 2006 Web Mining 47
A Practical Example for HITS
December 24, 2006 Web Mining 48
Difference between PageRank and HITS
The PageRank is computed for all web pages stored in the database and then prior to the query; HITS is performed on the set of retrieved web pages, and for each query.HITS computes authorities and hubs; PageRank computes authorities only.PageRank: non-trivial to compute, HITS: easy to compute, but real-time execution is hard
December 24, 2006 Web Mining 49
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 50
A cheaper method
Previous methods are expensive
There another simple method called communities trawling (CT)
It has been implemented on the graph of 200 millions pages, it worked very well
December 24, 2006 Web Mining 51
Basic idea of CT
Definition of communitiesdense directed bipartite sub graphs
Bipartite graph: Nodes are partitioned into two sets, F and CEvery directed edge in the graph is directed from a node u in F to a node v in Cdense if many of the possible edges between F and C are present
Fans Centers
F C
December 24, 2006 Web Mining 52
Basic idea of CT
Bipartite coresa complete bipartite subgraph with at least i nodes from F and at least j nodes from C i and j are tunable parametersA (i, j) Bipartite core
Every community have such a core with a certain i and j.
A (i=3, j=3) bipartite core
December 24, 2006 Web Mining 53
Basic idea of CT
A bipartite core is the identity of a community
To extract all the communities is to enumerate all the bipartite cores on the web.
Author invent an efficient algorithm to enumerate the bipartite cores. Its main idea is iterate pruning --elimination-generation pruning
December 24, 2006 Web Mining 54
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 55
Content Link Clustering
By CLC, each web page q in data set D is representedas 3 vectors:
qOutqIn
qKword
with M, N and L as the vector dimension respectively
The ith item of vector qOut (and qIn) indicates whether q has the corresponding out-link as the ith one in M out-links. If yes, the ith item is1, else 0.
The kth item of vector qKword indicates the frequency of the corresponding kth term of L appeared in page q.
December 24, 2006 Web Mining 56
Similarity Measure
The similarity of two pages Q and R is the linear combination of three parts:
poutS(Qout,Rout)+ pinS(Qin,Rin)+ ptermS(Qterm,Rterm)
pout +pin +pterm =1
S(Qout,Rout) is defined as Cosine of two out-link vectors.
December 24, 2006 Web Mining 57
Tuning the similarity measure
By varying weighting factors in second formula, it is possible to study the effects of out-links, in-link and terms on clustering process.
Results of term-based clustering is rather coarse and usually includes very general groups, which are totally different each other from semantic point of view.
E.g. for topic “jaguar”, “car” group and “animal” group are two very general groups with very different semantic topics;
December 24, 2006 Web Mining 58
Tuning the similarity measure
So, term-based clustering could only roughly separate pages into general semantic groups and failed to handle the finer case
Like “racing car” and “car driver club” since both pages may include some terms like “car, model etc.
The main reasons of poor “purity” of clusters produced by term-based clustering are:
Noise pages are included into clusters instead of removing since noise pages share some unimportant terms with other pages;
Pages that on different finer topics (but the same general topic) are mixed together.
December 24, 2006 Web Mining 59
Tuning the similarity measure
Hyperlinks represent the authors’ view of the relationship among Web pages
hyperlink-based clustering expresses “association” of pages.
Therefore, we could say that clusters produced by link-based clustering are in finer granularity.
The problem of link-based clustering is that some similar pages (e.g. new created pages) may not have enough co-citation/citation to be grouped together. That is to say, recall is some low.
December 24, 2006 Web Mining 60
Tuning the similarity measure
“T”, “L” and “CLC” to denote terms–based (with pout , pin and pKword as (0, 0, 1), link-based (with pout ,pin and pKword as (0.5, 0.5, 0) and contents-link coupled (with pout , pin and pKword as (0.2,0.3, 0.5) clustering approaches respectively.
Parameters are Similarity threshold weighting factors
The label of each cluster is identified automatically by term vector of centroid for each cluster.
December 24, 2006 Web Mining 61
Content Link Mining
December 24, 2006 Web Mining 62
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 63
Web Usage Mining
Web usage mining also known as Web log miningmining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the webIncluding
web log data, click-stream data, cookies, user queries, and any data related to the results of interaction between human’s interaction with the web
December 24, 2006 Web Mining 64
Web Usage MiningApplications
Target potential customers for electronic commerceEnhance the quality and delivery of Internet information services to the end userImprove Web server system performanceIdentify potential prime advertisement locationsFacilitates personalization/adaptive sitesImprove site designFraud/intrusion detectionPredict user’s actions (allows prefetching)
December 24, 2006 Web Mining 65
December 24, 2006 Web Mining 66
Web Log Clustering Applications
Association rules– Find pages that are often viewed togetherClustering– Cluster users based on browsing patterns– Cluster pages based on content
Server Logs
December 24, 2006 Web Mining 68
Fields
Client IP: 128.101.228.20Authenticated User ID: - -Time/Date: [10/Nov/1999:10:16:39 -0600]Request: "GET / HTTP/1.0"Status: 200Bytes: -Referrer: “-”Agent: "Mozilla/4.61 [en] (WinNT; I)"
December 24, 2006 Web Mining 69
WUM – Pre-Processing
Data CleaningRemoves log entries that are not needed for the mining
processData Integration
Synchronize data from multiple server logsUser Identification
Associates page references with different users
Session/Episode IdentificationGroups user’s page references into user sessions
Path CompletionFills in page references missing due to browser and proxy caching
December 24, 2006 Web Mining 70
December 24, 2006 Web Mining 71
WUM – Association Rule Generation
Discovers the correlations between pages that are most often referenced together in a single server sessionProvide the information
What are the set of pages frequently accessed together by Web users?What page will be fetched next?What are paths frequently accessed by Web users?
Association ruleA B [ Support = 60%, Confidence = 80% ]
Example“50% of visitors who accessed URLs /infor-f.html and labo/infos.htmlalso visited situation.html”
December 24, 2006 Web Mining 72
WUM – Clustering
Groups together a set of items having similar characteristicsUser Clusters
Discover groups of users exhibiting similar browsing patternsPage recommendation
User’s partial session is classified into a single clusterThe links contained in this cluster are recommended
December 24, 2006 Web Mining 73
Web Usage Clustering –Sample Results
clients who often access/products/software/webminer.htmltend to be from educational institutions.clients who placed an online order for software tend to be students in the 20-25 age group and live in the United States.75% of clients who download software from/products/software/demos/ visit between 7:00 and 11:00 pm on weekends.
December 24, 2006 Web Mining 74
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 75
Focused Crawling
Only visit links from a page if that page is determined to be relevant.Classifier is static after learning phase.Components:
Classifier which assigns relevance score to each page based on crawl topic.Distiller to identify hub pages.Crawler visits pages to based on crawler and distiller scores.
December 24, 2006 Web Mining 76
Focused Crawling
Classifier also determines how useful outgoing links areHub Pages contain links to many relevant pages. Must be visited even if not high relevance score.
December 24, 2006 Web Mining 77
Focused Crawling
December 24, 2006 Web Mining 78
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 79
In the web search context:organizing web pages (search results) into groups, so that different groups correspond to different user needs
search enginei.e.: engine car part
Engine Corp.Why not other data mining techniques?
Motivation
December 24, 2006 Web Mining 80
(1) Using Contents of Documents
Creating clusters based on snippets returned by web search engines.Clusters based on snippets are almost as good as clusters created using the full text of Web documents.Suffix Tree Clustering (STC) : incremental, O(n)time algorithm
LinearIncrementalOverlappingCan be extended to hierarchical
December 24, 2006 Web Mining 81
STC algorithm
Step 1: CleaningStemmingSentence boundary identificationPunctuation elimination
Step 2: Suffix tree constructionProduces base clusters (internal nodes)Base clusters are scored based on size and phrase score (which depends on length and word “quality”)
Step 3: Merging base clustersHighly overlapping clusters are merged
December 24, 2006 Web Mining 82
(2) Using user’s usage logs
Advantage: relevancy information is objectively reflected by the usage logsAn experimental result on www.nasa.gov/
Cluster 1 /shuttle/missions/41-c/news/shuttle/missions/61-b…
Cluster 2 /history/apollo/sa-2/news//history/apollo/sa-2/images…
Cluster 3 /software/winvn/userguide/3_3_2.htm/software/winvn/userguide/3_3_4.htm…
… ….
December 24, 2006 Web Mining 83
(3) Using hyperlinks
For each URL P in search results R, we extract its all out-links as well as top n in-links by services of AltaVistaWe could get all distinct N out-links and M in-links for all URLs in R.Each page P in R (result set) is represented as 2 vectors:
POut (N- dimension) PIn (Mdimension)
December 24, 2006 Web Mining 84
(3) Using Hyperlinks: continued
December 24, 2006 Web Mining 85
(3) Using Hyperlinks: continued
December 24, 2006 Web Mining 86
Concerns on current methods
Each method has pros and cons
Using hyperlinks : the best accuracy and still some room to improve
STC : best to browse and for incrementality.
December 24, 2006 Web Mining 87
Sample systems
Scatter/GatherGrouperCarrot2
VivisimoMapuccinoSHOC
December 24, 2006 Web Mining 88
Grouper
OnlineOperates on query result snippetsClusters together documents with large common subphrasesSuffix Tree Clustering (STC)STC induces labeling
December 24, 2006 Web Mining 89
December 24, 2006 Web Mining 90
December 24, 2006 Web Mining 91
December 24, 2006 Web Mining 92
Table of ContentsIntroductionWeb Content Mining
Feature Selection and Similarity MeasuresWeb Structure Mining
Web as Social NetworkFeatures and Similarity MeasuresSocial Network Analysis Algorithms
PageRankCyber-Communities
HITSCT
Web Content-Structure ClusteringWeb Usage MiningSome Concrete Applications of Web Mining
Focus CrawlingWeb Search Result Clustering
Summary
December 24, 2006 Web Mining 93
Web Mining
Web StructureMining
Web ContentMining
Web PageContent Mining
Search ResultMining
Web UsageMining
General AccessPattern Tracking
CustomizedUsage Tracking
Summary
December 24, 2006 Web Mining 94
Thank You