Post on 17-Jan-2016
description
Page Ranking Techniques In Search Engines
Introduction
Need Increasing need of Search engine.
Search results should be ordered byRelevancy.Importance.
What is Page Ranking
Algorithms
HITS (Hyperlink Induced Topic Search)
e.g.Alta Vista
PageRank
e.g. Google.
Definition – PageRank.We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter
d is a damping factor, which can be set between 0 and 1. We usually set d to 0.85 .……. C(A) is defined as the number of links
going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Ref: Sergey Brin and Lawrence Page ”The Anatomy of a Large-Scale Hypertextual Web Search Engine”
http://www-db.stanford.edu/~backrub/google.html
How to use formula.
e.g. 2 pages A and B, pointing to each other.
A B
Start with PR(A) = PR(B) =1
PR(A) = (1-d) + d * (PR(B)/C(B))
= (1-0.85) + 0.85 * (1/1) = 1
PR(B) = (1-d) + d * (PR(A)/C(A)) = (1-0.85) + 0.85 * (1/1) = 1
Lets start with PR(A) = PR(B) = 10
After 1st iteration:
PR(A) = (1-d) + d*(PR(B)/C(B))
= 0.15 + 0.85 * (10/1)
= 8.65
PR(B) = (1-d) + d*(PR(A)/C(A))
= 0.15 + 0.85 * (8.65/1)
= 7.50
After 2nd iteration:
PR(A) = (1-d) + d*(PR(B)/C(B))
= 0.15 + 0.85 * (7.50/1)
= 6.527
PR(B) = (1-d) + d*(PR(A)/C(A))
= 0.15 + 0.85 * (6.527/1)
= 5.698
And so on….. till?
Ans: Iterations should be repeated till PR values converges……..
In this example ……..tillPR(A) = PR(B) =1.
Thus we can start with any values of PR, and should repeat iterations till PR values converges i.e. don’t change too much.
Difference…
Result of PR calculation.
Google toolbar values
Examples
Assumption: We’ll take initial PR value of each page as 1.0
Example 1
A B PR(A) = (1-d) + d ( 0)
= 0.15 PR(B) = (1-d) + d (0)
= 0.15
For practicing examples on PageRank use calculator:www.webworkshop.net/pagerank_calculator.php?
lnks=2,10,15&iblprs=0.15,0.15,0.15,0.15&pgnms=&pgs=2&initpr=1&its=100&type=simple
Example 2
PR (A) = (1-d) + d (PR(B) / C(B))
= 0.15 + 0.85 (1/1) = 1PR (B) = (1-d) + d (0)
= 0.15
Dangling links are links that go to pages that don't have any outbound links.
Orphan pages are those, which don’t have any inbound link.
A B
Example 3
From here onwards I’ll represent final PR values after sufficient no. of
iterations inside page.
A 1.0
B 1.0
C 1.0
A 1.0
B 1.0
C 1.0
Example 4
Observation: We can channel large proportion of PR of site to a particular page.
A 1.85
B0.575
C0.575
Example 5
Observation: We can reduce PR leak by increasing internal link structure.
C1.255
A 2.6
B1.255
External Site 1 1.0
External Site 21.215
External Site1 1.0 A
1.0
B0.575
C0.575
External Site 20.638
Example 5 Cont..
External Site 1 1.0
A 2.146
B1.549
C1.720
External Site 21.215
How to increase PR?
By adding spam pages.
Join forum.
Submit to search engine directories.
Reciprocating links.
Contents.
Adding spam pages.
A 331.0
B281.6
Spam 1
0.39
Spam 2
0.39
Spam 1000
0.39
Conclusion.
Even though formula for calculating PageRank seems to be difficult, it is easy to understand. But when a simple calculation is applied hundreds of times, the results can seem complicated. And we can not predict the result of these iterations. Surely, more practice can yield more observations.
PageRank is important factor considered in Google ranking, but it is only one of the important factors considered. e.g. now a days Google is paying a lot of attention to the link’s anchor text while deciding relevancy of target page.
But as Page Rank is also one of the important factor, one should be well aware of PageRank while designing the website.
References.
http://www.webworkshop.net/pagerank.html
http://www.iprcom.com/papers/pagerank/
http://www-db.stanford.edu/~backrub/google.html
http://www.google.com/intl/en/technology/
http://www.google-watch.org/pagerank.html
?
Thanks