Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
-
Upload
sherilyn-warner -
Category
Documents
-
view
220 -
download
4
Transcript of Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Page 1
WEB MININGby NINI P SURESH
PROJECT CO-ORDINATOR
Kavitha Murugeshan
Page 2
OUTLINE
IntroductionData mining Vs Web miningWeb mining subtasksChallengesTaxonomyWeb content miningWeb structure miningWeb usage miningApplications
Page 3
INTRODUCTION
Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources.
The target of search engines is only to discover the resources on the web.
Page 4
INTRODUCTION
Needs for Web Mining
Narrowly searching scope
Low precision
Page 5
INTRODUCTION
Other Approaches
Database approach (DB)
Information retrieval
Natural language processing (NLP)
Web document community
Page 6
WEB MINING DEFENITION
Web mining refers to the overall process of
discovering potentially useful and
previously unknown information or
knowledge from the Web data.
Page 7
DATA MINING WEB MINING
Extraction of useful
patterns from data
sources like databases,
texts, web, images etc
Extracting relevant
information hidden in
Web-related data, like
hypertext documents
on web
Page 8
WEB MINING SUBTASKS
Resource finding
Information selection & preprocessing
Generalization
Analysis
Page 9
CHALLENGES
Search relevant information on web
Create knowledge
Personalization of Information
Learn patterns
Uniformity & standardisation
Page 10
CHALLENGES
Redundant Information
Noisy web
Monitoring changes
Sites providing Services
Privacy
Page 11
TAXONOMY
Web Mining
Web Structure Mining
Web Content Mining
Web Usage Mining
Web Text Mining
Web Multimedia
Mining
Personalized Usages Track
Gen. Access Pattern Track
Link Mining URL MiningInternal
Structure Mining
Page 12
WEB CONTENT MINING
Discovering useful information & Analyses
the content
Automatic process beyond keyword
extraction
Approaches to restructure document content
Two groups of mining strategies
Page 13
WEB CONTENT MINING
Agent based Approach
Intelligent search agents
Information filtering/categorization
Personalized web agents
Page 14
WEB CONTENT MINING
Database Approach
Multilevel databases
Web query system
Page 15
WEB STRUCTURE MINING
Discovering structure information from web
Web graph : web pages as nodes &
hyperlinks as edges
Page 16
WEB STRUCTURE MINING
Two algorithms for handling of links
PageRank
HITS
Page 17
WEB STRUCTURE MINING
PageRank
Metric for ranking hypertext documents
Depends on rank of pages pointing it
Iterative process
Page 18
WEB STRUCTURE MINING
n : Number of nodes in graph
Outdegree(q) : Number of hyperlinks on page q
d : damping factor
Page 19
WEB STRUCTURE MINING
HITS
Iterative algorithm
Identify topic hubs & authorities
Input : search results returned by traditional text
indexing technique
Page 20
WEB STRUCTURE MINING
Assigns weight to hub based on authoritiveness
Outputs pages with largest hub & authority
weights
Page 21
WEB USAGE MINING
Extracting information from server logs
Discover user access patterns of Web pages
Decomposed into 3 subtasks
Site Files
PreprocessingMining
algorithmsPattern
Analysis
Raw logsUser session
fileRules, Patterns
& Statistic
Interesting Rules, Patterns
& Statistic
Page 22
WEB USAGE MINING
Preprocessing
Data cleaning
User identification
User sessions identification
Access path supplement
Transaction identification
Page 23
WEB USAGE MINING
Pattern discovery
Statistical Analysis
Association Rules
Clustering analysis
Page 24
WEB USAGE MINING
Classification analysis
Sequential Pattern
Dependancy Modeling
Page 25
WEB USAGE MINING
Pattern Analysis
Eliminates irrelevant rules or patterns
Extract intresting patterns
Page 26
APPLICATIONS
Personalized Services
Improve website design
System Improvement
Predicting trends
Carry out intelligent buisness
Page 27
PROS
High trade volumes
Classify threats & fight against Terrorism
Establish better customer relationship
Increase profitability
Page 28
CONS
Invasion of Privacy
Discrimination by controversial attributes
Page 29
CONCLUSION
Rapidly growing area
Promising area of future research
Page 30
REFERENCE
[1] http://en.wikipedia.org/wiki/Web mining[2] http://www.galeas.de/webimining.html[3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000.[4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition[5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97[6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RE- SEARCH: A SURVEY, 2010 IEEE[7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition[8] Web mining: applications and techniques By Anthony Scime
Page 31
WEB MINING
Thank YouThank You