Web Mining Final

download Web Mining Final

of 21

Transcript of Web Mining Final

  • 8/2/2019 Web Mining Final

    1/21

    Click to edit Master subtitle styleCompany nameWeb Mining

  • 8/2/2019 Web Mining Final

    2/21

    Outline

    Hype of the web Difficulties with web

    Web Mining Advantages / Disadvantages Categories of Web Mining Web Usage Mining

  • 8/2/2019 Web Mining Final

    3/21

    Hype Of The Web

    Information Services

    E-mail Business Communities

  • 8/2/2019 Web Mining Final

    4/21

    Difficulties With Web

    Very Huge Information Semi-Structured Data

    Redundant Data Web is noisy Customer Behaviour

  • 8/2/2019 Web Mining Final

    5/21

    Web Mining

    Process of discovering useful and

    unknown information or knowledgefrom the web data.

  • 8/2/2019 Web Mining Final

    6/21

    Web Mining - Sub Tasks

    Resource Finding

    Information Selection Information Preprocessing Data Mining Techniques Analysis

  • 8/2/2019 Web Mining Final

    7/21

    Benefits

    Business Organization Banking Sector

    Customer Satisfaction Researchers Society

  • 8/2/2019 Web Mining Final

    8/21

    Drawbacks

    Privacy Issues

    Security Issues Mis-Use of Information Sometimes Expensive

  • 8/2/2019 Web Mining Final

    9/21

    Categories

  • 8/2/2019 Web Mining Final

    10/21

    Web Usage Mining

    The Process of Automaticdiscovering patterns and profile ofusers interacting with a web site.

  • 8/2/2019 Web Mining Final

    11/21

    Web Usage Mining Model

  • 8/2/2019 Web Mining Final

    12/21

    Data Cleaning: Clean the raw data,

    -Missing value-Redundant data

    -Outliers deletion

    Trasaction Derivation:

    -according to individual user

    transaction

  • 8/2/2019 Web Mining Final

    13/21

    Data Integration:

    - Combines data from multiplesources into a data store

    Transformation:

    - The data are transformed intoappropriate forms for mining such as

    generalization and normalization

  • 8/2/2019 Web Mining Final

    14/21

    Pattern Discovery

    Association Rule: X == > Y (support, confidence) 60% of clients who accessed /products/, also

    accessed /products/software/webminer.htm

    Sequential Pattern:

    Discoveryof frequently occurring orderedevents or subsequences as patterns.

  • 8/2/2019 Web Mining Final

    15/21

    Clustering

    process of grouping a set ofobjects into classes of similar objects

    Classification

    process of finding a model thatdescribes and distinguishes dataclasses or concepts

  • 8/2/2019 Web Mining Final

    16/21

    K-means Algorithm

    Used for clustering, where each clusterscenter is represented by the mean value ofthe objects in the cluster.

    Input:

    k: the number of clusters,

    D: a data set containing nobjects.

    Output:

    A set of k clusters.

  • 8/2/2019 Web Mining Final

    17/21

    Steps:

    (1) randomly choose k objects from D as the initialcluster centers;

    (2) repeat

    (3) (re)assign each object to the cluster to which theobject is the most similar , based on the mean valueof the objects in the cluster;

    (4) update the cluster means, i.e., calculate themean value of the objects for each cluster;

    (5) until no change;

  • 8/2/2019 Web Mining Final

    18/21

    K-Medoids Algorithm

    -To over come the limitations foundin k-means algorithm

    -Cluster is represented by the costvalue of the objects in the cluster.

  • 8/2/2019 Web Mining Final

    19/21

    Input:

    k: the number of clusters,

    D: a data set containing n objects.

    Output:A set of k clusters.

    Steps:

    (1) randomly choose k objects in D as the initial

    representative objects or seeds;

    (2) repeat

    (3) assign each remaining object to the cluster with thenearest representative object;

    (4) randomly select a nonrepresentative object, orandom;

    (5) compute the total cost, S, of swapping representativeobject, oj, with orandom;

    (6) if S < 0 then swap oj with orandom to form the new setof k representative objects;

    (7) until no change;

  • 8/2/2019 Web Mining Final

    20/21

    P : non represntative objectOj : current representative object

    Oi : another representative objectOrandom : good replacement object (nonrepresetative object)

  • 8/2/2019 Web Mining Final

    21/21

    Thank You