Web Mining Final
-
Upload
chag-vaibhav-p -
Category
Documents
-
view
223 -
download
0
Transcript of Web Mining Final
-
8/2/2019 Web Mining Final
1/21
Click to edit Master subtitle styleCompany nameWeb Mining
-
8/2/2019 Web Mining Final
2/21
Outline
Hype of the web Difficulties with web
Web Mining Advantages / Disadvantages Categories of Web Mining Web Usage Mining
-
8/2/2019 Web Mining Final
3/21
Hype Of The Web
Information Services
E-mail Business Communities
-
8/2/2019 Web Mining Final
4/21
Difficulties With Web
Very Huge Information Semi-Structured Data
Redundant Data Web is noisy Customer Behaviour
-
8/2/2019 Web Mining Final
5/21
Web Mining
Process of discovering useful and
unknown information or knowledgefrom the web data.
-
8/2/2019 Web Mining Final
6/21
Web Mining - Sub Tasks
Resource Finding
Information Selection Information Preprocessing Data Mining Techniques Analysis
-
8/2/2019 Web Mining Final
7/21
Benefits
Business Organization Banking Sector
Customer Satisfaction Researchers Society
-
8/2/2019 Web Mining Final
8/21
Drawbacks
Privacy Issues
Security Issues Mis-Use of Information Sometimes Expensive
-
8/2/2019 Web Mining Final
9/21
Categories
-
8/2/2019 Web Mining Final
10/21
Web Usage Mining
The Process of Automaticdiscovering patterns and profile ofusers interacting with a web site.
-
8/2/2019 Web Mining Final
11/21
Web Usage Mining Model
-
8/2/2019 Web Mining Final
12/21
Data Cleaning: Clean the raw data,
-Missing value-Redundant data
-Outliers deletion
Trasaction Derivation:
-according to individual user
transaction
-
8/2/2019 Web Mining Final
13/21
Data Integration:
- Combines data from multiplesources into a data store
Transformation:
- The data are transformed intoappropriate forms for mining such as
generalization and normalization
-
8/2/2019 Web Mining Final
14/21
Pattern Discovery
Association Rule: X == > Y (support, confidence) 60% of clients who accessed /products/, also
accessed /products/software/webminer.htm
Sequential Pattern:
Discoveryof frequently occurring orderedevents or subsequences as patterns.
-
8/2/2019 Web Mining Final
15/21
Clustering
process of grouping a set ofobjects into classes of similar objects
Classification
process of finding a model thatdescribes and distinguishes dataclasses or concepts
-
8/2/2019 Web Mining Final
16/21
K-means Algorithm
Used for clustering, where each clusterscenter is represented by the mean value ofthe objects in the cluster.
Input:
k: the number of clusters,
D: a data set containing nobjects.
Output:
A set of k clusters.
-
8/2/2019 Web Mining Final
17/21
Steps:
(1) randomly choose k objects from D as the initialcluster centers;
(2) repeat
(3) (re)assign each object to the cluster to which theobject is the most similar , based on the mean valueof the objects in the cluster;
(4) update the cluster means, i.e., calculate themean value of the objects for each cluster;
(5) until no change;
-
8/2/2019 Web Mining Final
18/21
K-Medoids Algorithm
-To over come the limitations foundin k-means algorithm
-Cluster is represented by the costvalue of the objects in the cluster.
-
8/2/2019 Web Mining Final
19/21
Input:
k: the number of clusters,
D: a data set containing n objects.
Output:A set of k clusters.
Steps:
(1) randomly choose k objects in D as the initial
representative objects or seeds;
(2) repeat
(3) assign each remaining object to the cluster with thenearest representative object;
(4) randomly select a nonrepresentative object, orandom;
(5) compute the total cost, S, of swapping representativeobject, oj, with orandom;
(6) if S < 0 then swap oj with orandom to form the new setof k representative objects;
(7) until no change;
-
8/2/2019 Web Mining Final
20/21
P : non represntative objectOj : current representative object
Oi : another representative objectOrandom : good replacement object (nonrepresetative object)
-
8/2/2019 Web Mining Final
21/21
Thank You