Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

31
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan

Transcript of Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 1: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 1

WEB MININGby NINI P SURESH

PROJECT CO-ORDINATOR

Kavitha Murugeshan

Page 2: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 2

OUTLINE

IntroductionData mining Vs Web miningWeb mining subtasksChallengesTaxonomyWeb content miningWeb structure miningWeb usage miningApplications

Page 3: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 3

INTRODUCTION

Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources.

The target of search engines is only to discover the resources on the web.

Page 4: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 4

INTRODUCTION

Needs for Web Mining

Narrowly searching scope

Low precision

Page 5: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 5

INTRODUCTION

Other Approaches

Database approach (DB)

Information retrieval

Natural language processing (NLP)

Web document community

Page 6: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 6

WEB MINING DEFENITION

Web mining refers to the overall process of

discovering potentially useful and

previously unknown information or

knowledge from the Web data.

Page 7: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 7

DATA MINING WEB MINING

Extraction of useful

patterns from data

sources like databases,

texts, web, images etc

Extracting relevant

information hidden in

Web-related data, like

hypertext documents

on web

Page 8: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 8

WEB MINING SUBTASKS

Resource finding

Information selection & preprocessing

Generalization

Analysis

Page 9: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 9

CHALLENGES

Search relevant information on web

Create knowledge

Personalization of Information

Learn patterns

Uniformity & standardisation

Page 10: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 10

CHALLENGES

Redundant Information

Noisy web

Monitoring changes

Sites providing Services

Privacy

Page 11: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 11

TAXONOMY

Web Mining

Web Structure Mining

Web Content Mining

Web Usage Mining

Web Text Mining

Web Multimedia

Mining

Personalized Usages Track

Gen. Access Pattern Track

Link Mining URL MiningInternal

Structure Mining

Page 12: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 12

WEB CONTENT MINING

Discovering useful information & Analyses

the content

Automatic process beyond keyword

extraction

Approaches to restructure document content

Two groups of mining strategies

Page 13: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 13

WEB CONTENT MINING

Agent based Approach

Intelligent search agents

Information filtering/categorization

Personalized web agents

Page 14: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 14

WEB CONTENT MINING

Database Approach

Multilevel databases

Web query system

Page 15: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 15

WEB STRUCTURE MINING

Discovering structure information from web

Web graph : web pages as nodes &

hyperlinks as edges

Page 16: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 16

WEB STRUCTURE MINING

Two algorithms for handling of links

PageRank

HITS

Page 17: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 17

WEB STRUCTURE MINING

PageRank

Metric for ranking hypertext documents

Depends on rank of pages pointing it

Iterative process

Page 18: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 18

WEB STRUCTURE MINING

n : Number of nodes in graph

Outdegree(q) : Number of hyperlinks on page q

d : damping factor

Page 19: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 19

WEB STRUCTURE MINING

HITS

Iterative algorithm

Identify topic hubs & authorities

Input : search results returned by traditional text

indexing technique

Page 20: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 20

WEB STRUCTURE MINING

Assigns weight to hub based on authoritiveness

Outputs pages with largest hub & authority

weights

Page 21: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 21

WEB USAGE MINING

Extracting information from server logs

Discover user access patterns of Web pages

Decomposed into 3 subtasks

Site Files

PreprocessingMining

algorithmsPattern

Analysis

Raw logsUser session

fileRules, Patterns

& Statistic

Interesting Rules, Patterns

& Statistic

Page 22: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 22

WEB USAGE MINING

Preprocessing

Data cleaning

User identification

User sessions identification

Access path supplement

Transaction identification

Page 23: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 23

WEB USAGE MINING

Pattern discovery

Statistical Analysis

Association Rules

Clustering analysis

Page 24: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 24

WEB USAGE MINING

Classification analysis

Sequential Pattern

Dependancy Modeling

Page 25: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 25

WEB USAGE MINING

Pattern Analysis

Eliminates irrelevant rules or patterns

Extract intresting patterns

Page 26: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 26

APPLICATIONS

Personalized Services

Improve website design

System Improvement

Predicting trends

Carry out intelligent buisness

Page 27: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 27

PROS

High trade volumes

Classify threats & fight against Terrorism

Establish better customer relationship

Increase profitability

Page 28: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 28

CONS

Invasion of Privacy

Discrimination by controversial attributes

Page 29: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 29

CONCLUSION

Rapidly growing area

Promising area of future research

Page 30: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 30

REFERENCE

[1] http://en.wikipedia.org/wiki/Web mining[2] http://www.galeas.de/webimining.html[3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000.[4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition[5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97[6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RE- SEARCH: A SURVEY, 2010 IEEE[7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition[8] Web mining: applications and techniques By Anthony Scime

Page 31: Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Page 31

WEB MINING

Thank YouThank You