“PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a...

56
“PageRank Uncovered” Formerly PageRank Explained ©2002 All Rights Reserved Written and theorised by Chris Ridings and Mike Shishigin Edited by Jill Whalen Technical Editing by Yuri Baranov Profiles: Chris Ridings is the original author of PageRank Explained. A former programmer of custom search engines, search engine expert, creator of the search engine optimization support forums (http://www.supportforums.org) and author of marketing software tools (http://www.iebar.com/ ). Chris uses his programming experience and knowledge of algorithms to get an insight into how search engines work “under the cover.” Mike Shishigin, PhD, is a brilliant search engine expert, whose philosophy is to always back up all his words with hard facts derived from deep knowledge of applied mathematics. As Director of Technology at Radiocom Ltd. Mike uses methods of sampling analysis to get into the nature of things, providing users of WebSite CEO with an outstanding Web site promotion suite (http://www.websiteceo.com); thus providing them with the best possible results in the search engines. Mike controls all R&D and search engine studies within WebSite CEO. His latest "hobby" is the PageRank technology because it is similar to graph theory, his favourite university course. His strongest dislike is search engine spam, which he considers the deadly sin of Internet age. Mike mercilessly fights it by enlightening minds of WebSite CEO users and making regular raids into spammers' camps. Edited by Jill Whalen, owner of High Rankings and moderator of the High Rankings Advisor free weekly email newsletter (http://www.highrankings.com/advisor.htm) . Jill has been practicing successful search engine optimisation since 1995 for a wide variety of clients. Rather than worry about PageRank, Jill prefers to help sites be the best they can be, which in turn garners them lots of inbound links (and thus high PageRank!). Technical Editing by Yuri Baranov: Yuri is Director of Marketing at Radiocom Ltd. He concentrates on general Web site promotion and search engine optimization. Yuri has a groundbreaking insight into search engine marketing, and does all in his power to make it more consistent and comprehensible to ordinary people. He currently coordinates all search engine studies, sets goals, and evaluates results. He is also interested in human engineering and usability issues. Yuri agreed to become the technical editor of this paper because he is head over hills in love with Google, which he considers "the best search engine ever." Reproduction of this document in whole or in part is permitted solely with the prior consent of the authors. “Fair Use” reproduction is allowed, however, you should think carefully if you need to reproduce more than a few paragraphs. Educational Institutions are allowed to copy this document freely. VERSION 3.0 Last amended – September 2002

Transcript of “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a...

Page 1: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

ldquoPageRank UncoveredrdquoFormerly PageRank Explained

copy2002 All Rights Reserved

Written and theorised by Chris Ridings and Mike Shishigin

Edited by Jill Whalen

Technical Editing by Yuri Baranov

Profiles

Chris Ridings is the original author of PageRank Explained A former programmer of custom searchengines search engine expert creator of the search engine optimization support forums(httpwwwsupportforumsorg) and author of marketing software tools (httpwwwiebarcom) Chris useshis programming experience and knowledge of algorithms to get an insight into how search engines workldquounder the coverrdquo

Mike Shishigin PhD is a brilliant search engine expert whose philosophy is to always back up all his wordswith hard facts derived from deep knowledge of applied mathematics As Director of Technology atRadiocom Ltd Mike uses methods of sampling analysis to get into the nature of things providing users ofWebSite CEO with an outstanding Web site promotion suite (httpwwwwebsiteceocom) thus providingthem with the best possible results in the search engines Mike controls all RampD and search engine studieswithin WebSite CEO His latest hobby is the PageRank technology because it is similar to graph theoryhis favourite university course His strongest dislike is search engine spam which he considers the deadlysin of Internet age Mike mercilessly fights it by enlightening minds of WebSite CEO users and makingregular raids into spammers camps

Edited by Jill Whalen owner of High Rankings and moderator of the High Rankings Advisor free weeklyemail newsletter (httpwwwhighrankingscomadvisorhtm) Jill has been practicing successful searchengine optimisation since 1995 for a wide variety of clients Rather than worry about PageRank Jill prefersto help sites be the best they can be which in turn garners them lots of inbound links (and thus highPageRank)

Technical Editing by Yuri Baranov Yuri is Director of Marketing at Radiocom Ltd He concentrates ongeneral Web site promotion and search engine optimization Yuri has a groundbreaking insight into searchengine marketing and does all in his power to make it more consistent and comprehensible to ordinarypeople He currently coordinates all search engine studies sets goals and evaluates results He is alsointerested in human engineering and usability issues Yuri agreed to become the technical editor of thispaper because he is head over hills in love with Google which he considers the best search engine ever

Reproduction of this document in whole or in part is permitted solely with the prior consent of the authorsldquoFair Userdquo reproduction is allowed however you should think carefully if you need to reproduce more than a

few paragraphs Educational Institutions are allowed to copy this document freely

VERSION 30 Last amended ndash September 2002

Introduction 2What is PageRank 2How is PageRank determined 3How can you tell what a pagersquos PageRank is 3How accurate is the Google toolbar 3How significant is PageRank 5Is PageRank a good determination of the quality of a page 6Googlersquos First 1000 Results 7The difference between PageRank and other factors 8Non-PageRank Factor Threshold 8Using the threshold to derive the worth of two ranking strategies 9Really heavy competition 11How PageRank is calculated 12A word on convergence 16Derived specifics of how Google calculates PageRank 18PageRank Feedback how linking sometimes helps 19Influencing the results 22Controlling PageRank 22Links to Your Site 23Links out of your site 24Internal Structures and Linkages 28What Google says 34Final word to non mathematicians 35Advanced PageRank topics 36Speeding Up PageRank 36How PageRank could be more than it seems 42Favouring pages in directories 44Examples 45Mathematical aspects of PageRank for advanced readers 45Further Reading and Resources 54Technical Proof Readers credits 555

2

Introduction

The original PageRank Explained document has existed for a long time It wasthe first of its kind to present many of the ideas in a way that the average personcan understand You may be wondering why there is a need for a new version ofthis document The reasoning is simple everything in the original document wastheoretical and subject to change These changes have occurred through theauthorsrsquo further learning on how Google reacts and how we think about thesubject To this end you will note that an additional author and technical editorhas been added to the paper Mike Shishiginrsquos extensive mathematical abilityenables us to explore some new areas of PageRank not mentioned before aswell as provide a section at the end that contains more advanced informationWe hope to present PageRank in a way that gets across everything you couldpossibly need to know -- without baffling you However if you want to know theextremely technical stuff then wersquoll give you that at the end

Were pleased that the principles in the original PageRank Explained documentseem to have been accepted universally within the search engine optimizationfield The one document that appears to criticise it later explains why each ofthe points it makes are valid Were also pleased that since its publication a vastamount of additional articles have been written about PageRank

Despite all this things have changed and there are still a lot ofmisunderstandings regarding the importance (or unimportance) of PageRankWersquove added new sections to address these and new sections to address somenew things Weve also tried to make the existing sections moreunderstandable This is a long document itrsquos written so that a broad range ofpeople can understand it As itrsquos also used as a major reference in severaluniversity courses we will do our best detail as much as possible If you are ldquoJoeAveragerdquo then you can simply limit your concern to understanding the principlesuntil youre ready to dig deeper

Finally we thought it was time to dedicate some space to Googles thoughtsabout PageRank Obviously Google is not going to comment on exact rankingissues but as it is their technology what they say about it gives us lots of insightWe asked Barry Schnitt spokesperson for Google a few questions about variousPageRank related topics and he was kind enough to give us some answersyoursquoll find those answers later in the document

What is PageRank

PageRank is Googles method of measuring a pages importance When allother factors such as Title tag and keywords are taken into account Google usesPageRank to adjust results so that sites that are deemed more important willmove up in the results page of a users search accordingly

3

A basic overview of how Google ranks pages in their search engine resultspages (SERPS) follows

1) Find all pages matching the keywords of the search2) Rank accordingly using on the page factors such as keywords3) Calculate in the inbound anchor text4) Adjust the results by PageRank scores

In reality itrsquos slightly more complex and wersquoll discuss this in more depth later butfor now the above description serves our purposes Itrsquos worth noting thatPageRank is a multiplier and is not just simply added to the score Thus if yourpage had a PageRank of zero it would rank at the very end of the SERPS

How is PageRank determined

The Google theory goes that if Page A links to Page B then Page A is sayingthat Page B is an important page PageRank also factors in the importance of thelinks pointing to a page If a page has important links pointing to it then its linksto other pages also become important The actual text of the link is irrelevantwhen discussing PageRank

How can you tell what a pagersquos PageRank is

To learn what a pagersquos PageRank is you can download a toolbar for InternetExplorer from httptoolbargooglecom Once installed there will be a bar graphat the top of the browser showing a version of PageRank for the page yoursquorebrowsing When you hold the mouse over the bar you see a number from zero toten (If you donrsquot see the number you may have an older version of the toolbarinstalled You will need to completely uninstall it reboot your computer andreinstall the latest version Once this is done you should be able to see thePageRank number)

How accurate is the Google toolbar

The Google toolbar is not very accurate in showing you the actual PageRank of asite but itrsquos the only thing right now that can give you any idea As long as youknow the toolbarrsquos limitations then at least you know what you are viewing

There are two limitations to the Google toolbar

1 The toolbar sometimes guesses If you enter a page which is not inits index but where there is a page that is very close to it in Googlersquosindex then it will provide a guesstimate of the PageRank This

4

guesstimate is worthless for our purposes because it isnrsquot featured inany of the PageRank calculations The only way to tell if the toolbar isa guesstimate is to type the URL into the Google search box and see ifthe page shows up in the SERPS If it doesnrsquot then the toolbar isguessing

2 The toolbar is just a representation of actual PageRank WhilstPageRank is linear Google has chosen to use a non-linear graph toportray it So on the toolbar to move from a PageRank of 2 to aPageRank of 3 takes less of an increase than to move from aPageRank of 3 to a PageRank of 4 A comparison table best illustratesthis phenomenon The actual figures are kept secret so wersquoll just useany figures for demonstration purposes

If the actualPageRank is

betweenThe Toolbar Shows

000000001 and 5 16 and 25 2

25 and 125 3126 and 625 4

626 and 3125 53126 and 15625 6

15626 and 78125 778126 and 390625 8

390626 and 1953125 91953126 and infinity 10

The PageRank shown in the Google directory (httpdirectorygooglecom) suffers fromthe same problems The PageRank shown in the directory is also on a differentscale There have been attempts to cross-reference these two scales butbecause they are non-linear the results really do not tell you anything more thanyou already know

Also of note is that a programmer managed to generate a tool to look upPageRank without using Internet Explorer This tool has since been withdrawnbut whilst originally the numbers given by this software and Googlersquos toolbarmatched - presently querying with such software sometimes produces differentnumbers than querying with the toolbar This is Googlersquos right to protect theirdata but is the strongest indication that

5

Sometimes what you see on the Google toolbar may not be related to actualPageRank at all (Google can and does assign whatever toolbar PageRankvalue they want to assign to a page)

How significant is PageRank

The significance of any one factor in search engine algorithms depends on thequality of the information it supplies A factorrsquos importance is known as its weightTo demonstrate how weighting is arrived at its easiest to move away fromPageRank for a second and look at Meta tags Originally when the Metakeyword tag was new you could write something like this in your document

ltmeta name=rdquokeywordsrdquo content=rdquopagerank pagerank uncovered algorithmalgorithmsrdquogt

In theory the Meta keyword tag was a very good indicator of what the page wasabout However as most are well aware ndash the weighting for the keywords tag isfast approaching nothing Two things have contributed to this

1 The ease at which Webmasters can manipulate it2 The level of manipulation by Webmasters

These two things are separate factors but with human nature being what it isthe easier something is to influence ndash the more it is manipulated Thecombination of these factors determines the ldquoweightingrdquo ndash ie how much we trustthe information provided by that factor

So it makes sense to look at these factors in relation to PageRank first

PageRank is without doubt one of the hardest things for a Webmaster tomanipulate ethically However it is possible to generate links to your site fromother sites fairly simply through the use of link farms and guestbooks Googlefrowns upon this kind of abuse and many sites that have tried this have had theirPageRank influence blocked But it must be said that the abuse is still rampantand that it can have an influence on PageRank So whilst not easy to doPageRank is still subject to manipulation

The extent to which PageRank is manipulated has also changed Most people nolonger believe Googlersquos old line of people not being able to influence PageRankand the results based on it However there is more information about PageRankavailable than ever and people are more aware of manipulation techniques

So whilst PageRank is valuable you should be careful not to over-estimate itsusage and capabilities Your final ranking in Google is due to a mix of factors of

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 2: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

Introduction 2What is PageRank 2How is PageRank determined 3How can you tell what a pagersquos PageRank is 3How accurate is the Google toolbar 3How significant is PageRank 5Is PageRank a good determination of the quality of a page 6Googlersquos First 1000 Results 7The difference between PageRank and other factors 8Non-PageRank Factor Threshold 8Using the threshold to derive the worth of two ranking strategies 9Really heavy competition 11How PageRank is calculated 12A word on convergence 16Derived specifics of how Google calculates PageRank 18PageRank Feedback how linking sometimes helps 19Influencing the results 22Controlling PageRank 22Links to Your Site 23Links out of your site 24Internal Structures and Linkages 28What Google says 34Final word to non mathematicians 35Advanced PageRank topics 36Speeding Up PageRank 36How PageRank could be more than it seems 42Favouring pages in directories 44Examples 45Mathematical aspects of PageRank for advanced readers 45Further Reading and Resources 54Technical Proof Readers credits 555

2

Introduction

The original PageRank Explained document has existed for a long time It wasthe first of its kind to present many of the ideas in a way that the average personcan understand You may be wondering why there is a need for a new version ofthis document The reasoning is simple everything in the original document wastheoretical and subject to change These changes have occurred through theauthorsrsquo further learning on how Google reacts and how we think about thesubject To this end you will note that an additional author and technical editorhas been added to the paper Mike Shishiginrsquos extensive mathematical abilityenables us to explore some new areas of PageRank not mentioned before aswell as provide a section at the end that contains more advanced informationWe hope to present PageRank in a way that gets across everything you couldpossibly need to know -- without baffling you However if you want to know theextremely technical stuff then wersquoll give you that at the end

Were pleased that the principles in the original PageRank Explained documentseem to have been accepted universally within the search engine optimizationfield The one document that appears to criticise it later explains why each ofthe points it makes are valid Were also pleased that since its publication a vastamount of additional articles have been written about PageRank

Despite all this things have changed and there are still a lot ofmisunderstandings regarding the importance (or unimportance) of PageRankWersquove added new sections to address these and new sections to address somenew things Weve also tried to make the existing sections moreunderstandable This is a long document itrsquos written so that a broad range ofpeople can understand it As itrsquos also used as a major reference in severaluniversity courses we will do our best detail as much as possible If you are ldquoJoeAveragerdquo then you can simply limit your concern to understanding the principlesuntil youre ready to dig deeper

Finally we thought it was time to dedicate some space to Googles thoughtsabout PageRank Obviously Google is not going to comment on exact rankingissues but as it is their technology what they say about it gives us lots of insightWe asked Barry Schnitt spokesperson for Google a few questions about variousPageRank related topics and he was kind enough to give us some answersyoursquoll find those answers later in the document

What is PageRank

PageRank is Googles method of measuring a pages importance When allother factors such as Title tag and keywords are taken into account Google usesPageRank to adjust results so that sites that are deemed more important willmove up in the results page of a users search accordingly

3

A basic overview of how Google ranks pages in their search engine resultspages (SERPS) follows

1) Find all pages matching the keywords of the search2) Rank accordingly using on the page factors such as keywords3) Calculate in the inbound anchor text4) Adjust the results by PageRank scores

In reality itrsquos slightly more complex and wersquoll discuss this in more depth later butfor now the above description serves our purposes Itrsquos worth noting thatPageRank is a multiplier and is not just simply added to the score Thus if yourpage had a PageRank of zero it would rank at the very end of the SERPS

How is PageRank determined

The Google theory goes that if Page A links to Page B then Page A is sayingthat Page B is an important page PageRank also factors in the importance of thelinks pointing to a page If a page has important links pointing to it then its linksto other pages also become important The actual text of the link is irrelevantwhen discussing PageRank

How can you tell what a pagersquos PageRank is

To learn what a pagersquos PageRank is you can download a toolbar for InternetExplorer from httptoolbargooglecom Once installed there will be a bar graphat the top of the browser showing a version of PageRank for the page yoursquorebrowsing When you hold the mouse over the bar you see a number from zero toten (If you donrsquot see the number you may have an older version of the toolbarinstalled You will need to completely uninstall it reboot your computer andreinstall the latest version Once this is done you should be able to see thePageRank number)

How accurate is the Google toolbar

The Google toolbar is not very accurate in showing you the actual PageRank of asite but itrsquos the only thing right now that can give you any idea As long as youknow the toolbarrsquos limitations then at least you know what you are viewing

There are two limitations to the Google toolbar

1 The toolbar sometimes guesses If you enter a page which is not inits index but where there is a page that is very close to it in Googlersquosindex then it will provide a guesstimate of the PageRank This

4

guesstimate is worthless for our purposes because it isnrsquot featured inany of the PageRank calculations The only way to tell if the toolbar isa guesstimate is to type the URL into the Google search box and see ifthe page shows up in the SERPS If it doesnrsquot then the toolbar isguessing

2 The toolbar is just a representation of actual PageRank WhilstPageRank is linear Google has chosen to use a non-linear graph toportray it So on the toolbar to move from a PageRank of 2 to aPageRank of 3 takes less of an increase than to move from aPageRank of 3 to a PageRank of 4 A comparison table best illustratesthis phenomenon The actual figures are kept secret so wersquoll just useany figures for demonstration purposes

If the actualPageRank is

betweenThe Toolbar Shows

000000001 and 5 16 and 25 2

25 and 125 3126 and 625 4

626 and 3125 53126 and 15625 6

15626 and 78125 778126 and 390625 8

390626 and 1953125 91953126 and infinity 10

The PageRank shown in the Google directory (httpdirectorygooglecom) suffers fromthe same problems The PageRank shown in the directory is also on a differentscale There have been attempts to cross-reference these two scales butbecause they are non-linear the results really do not tell you anything more thanyou already know

Also of note is that a programmer managed to generate a tool to look upPageRank without using Internet Explorer This tool has since been withdrawnbut whilst originally the numbers given by this software and Googlersquos toolbarmatched - presently querying with such software sometimes produces differentnumbers than querying with the toolbar This is Googlersquos right to protect theirdata but is the strongest indication that

5

Sometimes what you see on the Google toolbar may not be related to actualPageRank at all (Google can and does assign whatever toolbar PageRankvalue they want to assign to a page)

How significant is PageRank

The significance of any one factor in search engine algorithms depends on thequality of the information it supplies A factorrsquos importance is known as its weightTo demonstrate how weighting is arrived at its easiest to move away fromPageRank for a second and look at Meta tags Originally when the Metakeyword tag was new you could write something like this in your document

ltmeta name=rdquokeywordsrdquo content=rdquopagerank pagerank uncovered algorithmalgorithmsrdquogt

In theory the Meta keyword tag was a very good indicator of what the page wasabout However as most are well aware ndash the weighting for the keywords tag isfast approaching nothing Two things have contributed to this

1 The ease at which Webmasters can manipulate it2 The level of manipulation by Webmasters

These two things are separate factors but with human nature being what it isthe easier something is to influence ndash the more it is manipulated Thecombination of these factors determines the ldquoweightingrdquo ndash ie how much we trustthe information provided by that factor

So it makes sense to look at these factors in relation to PageRank first

PageRank is without doubt one of the hardest things for a Webmaster tomanipulate ethically However it is possible to generate links to your site fromother sites fairly simply through the use of link farms and guestbooks Googlefrowns upon this kind of abuse and many sites that have tried this have had theirPageRank influence blocked But it must be said that the abuse is still rampantand that it can have an influence on PageRank So whilst not easy to doPageRank is still subject to manipulation

The extent to which PageRank is manipulated has also changed Most people nolonger believe Googlersquos old line of people not being able to influence PageRankand the results based on it However there is more information about PageRankavailable than ever and people are more aware of manipulation techniques

So whilst PageRank is valuable you should be careful not to over-estimate itsusage and capabilities Your final ranking in Google is due to a mix of factors of

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 3: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

2

Introduction

The original PageRank Explained document has existed for a long time It wasthe first of its kind to present many of the ideas in a way that the average personcan understand You may be wondering why there is a need for a new version ofthis document The reasoning is simple everything in the original document wastheoretical and subject to change These changes have occurred through theauthorsrsquo further learning on how Google reacts and how we think about thesubject To this end you will note that an additional author and technical editorhas been added to the paper Mike Shishiginrsquos extensive mathematical abilityenables us to explore some new areas of PageRank not mentioned before aswell as provide a section at the end that contains more advanced informationWe hope to present PageRank in a way that gets across everything you couldpossibly need to know -- without baffling you However if you want to know theextremely technical stuff then wersquoll give you that at the end

Were pleased that the principles in the original PageRank Explained documentseem to have been accepted universally within the search engine optimizationfield The one document that appears to criticise it later explains why each ofthe points it makes are valid Were also pleased that since its publication a vastamount of additional articles have been written about PageRank

Despite all this things have changed and there are still a lot ofmisunderstandings regarding the importance (or unimportance) of PageRankWersquove added new sections to address these and new sections to address somenew things Weve also tried to make the existing sections moreunderstandable This is a long document itrsquos written so that a broad range ofpeople can understand it As itrsquos also used as a major reference in severaluniversity courses we will do our best detail as much as possible If you are ldquoJoeAveragerdquo then you can simply limit your concern to understanding the principlesuntil youre ready to dig deeper

Finally we thought it was time to dedicate some space to Googles thoughtsabout PageRank Obviously Google is not going to comment on exact rankingissues but as it is their technology what they say about it gives us lots of insightWe asked Barry Schnitt spokesperson for Google a few questions about variousPageRank related topics and he was kind enough to give us some answersyoursquoll find those answers later in the document

What is PageRank

PageRank is Googles method of measuring a pages importance When allother factors such as Title tag and keywords are taken into account Google usesPageRank to adjust results so that sites that are deemed more important willmove up in the results page of a users search accordingly

3

A basic overview of how Google ranks pages in their search engine resultspages (SERPS) follows

1) Find all pages matching the keywords of the search2) Rank accordingly using on the page factors such as keywords3) Calculate in the inbound anchor text4) Adjust the results by PageRank scores

In reality itrsquos slightly more complex and wersquoll discuss this in more depth later butfor now the above description serves our purposes Itrsquos worth noting thatPageRank is a multiplier and is not just simply added to the score Thus if yourpage had a PageRank of zero it would rank at the very end of the SERPS

How is PageRank determined

The Google theory goes that if Page A links to Page B then Page A is sayingthat Page B is an important page PageRank also factors in the importance of thelinks pointing to a page If a page has important links pointing to it then its linksto other pages also become important The actual text of the link is irrelevantwhen discussing PageRank

How can you tell what a pagersquos PageRank is

To learn what a pagersquos PageRank is you can download a toolbar for InternetExplorer from httptoolbargooglecom Once installed there will be a bar graphat the top of the browser showing a version of PageRank for the page yoursquorebrowsing When you hold the mouse over the bar you see a number from zero toten (If you donrsquot see the number you may have an older version of the toolbarinstalled You will need to completely uninstall it reboot your computer andreinstall the latest version Once this is done you should be able to see thePageRank number)

How accurate is the Google toolbar

The Google toolbar is not very accurate in showing you the actual PageRank of asite but itrsquos the only thing right now that can give you any idea As long as youknow the toolbarrsquos limitations then at least you know what you are viewing

There are two limitations to the Google toolbar

1 The toolbar sometimes guesses If you enter a page which is not inits index but where there is a page that is very close to it in Googlersquosindex then it will provide a guesstimate of the PageRank This

4

guesstimate is worthless for our purposes because it isnrsquot featured inany of the PageRank calculations The only way to tell if the toolbar isa guesstimate is to type the URL into the Google search box and see ifthe page shows up in the SERPS If it doesnrsquot then the toolbar isguessing

2 The toolbar is just a representation of actual PageRank WhilstPageRank is linear Google has chosen to use a non-linear graph toportray it So on the toolbar to move from a PageRank of 2 to aPageRank of 3 takes less of an increase than to move from aPageRank of 3 to a PageRank of 4 A comparison table best illustratesthis phenomenon The actual figures are kept secret so wersquoll just useany figures for demonstration purposes

If the actualPageRank is

betweenThe Toolbar Shows

000000001 and 5 16 and 25 2

25 and 125 3126 and 625 4

626 and 3125 53126 and 15625 6

15626 and 78125 778126 and 390625 8

390626 and 1953125 91953126 and infinity 10

The PageRank shown in the Google directory (httpdirectorygooglecom) suffers fromthe same problems The PageRank shown in the directory is also on a differentscale There have been attempts to cross-reference these two scales butbecause they are non-linear the results really do not tell you anything more thanyou already know

Also of note is that a programmer managed to generate a tool to look upPageRank without using Internet Explorer This tool has since been withdrawnbut whilst originally the numbers given by this software and Googlersquos toolbarmatched - presently querying with such software sometimes produces differentnumbers than querying with the toolbar This is Googlersquos right to protect theirdata but is the strongest indication that

5

Sometimes what you see on the Google toolbar may not be related to actualPageRank at all (Google can and does assign whatever toolbar PageRankvalue they want to assign to a page)

How significant is PageRank

The significance of any one factor in search engine algorithms depends on thequality of the information it supplies A factorrsquos importance is known as its weightTo demonstrate how weighting is arrived at its easiest to move away fromPageRank for a second and look at Meta tags Originally when the Metakeyword tag was new you could write something like this in your document

ltmeta name=rdquokeywordsrdquo content=rdquopagerank pagerank uncovered algorithmalgorithmsrdquogt

In theory the Meta keyword tag was a very good indicator of what the page wasabout However as most are well aware ndash the weighting for the keywords tag isfast approaching nothing Two things have contributed to this

1 The ease at which Webmasters can manipulate it2 The level of manipulation by Webmasters

These two things are separate factors but with human nature being what it isthe easier something is to influence ndash the more it is manipulated Thecombination of these factors determines the ldquoweightingrdquo ndash ie how much we trustthe information provided by that factor

So it makes sense to look at these factors in relation to PageRank first

PageRank is without doubt one of the hardest things for a Webmaster tomanipulate ethically However it is possible to generate links to your site fromother sites fairly simply through the use of link farms and guestbooks Googlefrowns upon this kind of abuse and many sites that have tried this have had theirPageRank influence blocked But it must be said that the abuse is still rampantand that it can have an influence on PageRank So whilst not easy to doPageRank is still subject to manipulation

The extent to which PageRank is manipulated has also changed Most people nolonger believe Googlersquos old line of people not being able to influence PageRankand the results based on it However there is more information about PageRankavailable than ever and people are more aware of manipulation techniques

So whilst PageRank is valuable you should be careful not to over-estimate itsusage and capabilities Your final ranking in Google is due to a mix of factors of

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 4: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

3

A basic overview of how Google ranks pages in their search engine resultspages (SERPS) follows

1) Find all pages matching the keywords of the search2) Rank accordingly using on the page factors such as keywords3) Calculate in the inbound anchor text4) Adjust the results by PageRank scores

In reality itrsquos slightly more complex and wersquoll discuss this in more depth later butfor now the above description serves our purposes Itrsquos worth noting thatPageRank is a multiplier and is not just simply added to the score Thus if yourpage had a PageRank of zero it would rank at the very end of the SERPS

How is PageRank determined

The Google theory goes that if Page A links to Page B then Page A is sayingthat Page B is an important page PageRank also factors in the importance of thelinks pointing to a page If a page has important links pointing to it then its linksto other pages also become important The actual text of the link is irrelevantwhen discussing PageRank

How can you tell what a pagersquos PageRank is

To learn what a pagersquos PageRank is you can download a toolbar for InternetExplorer from httptoolbargooglecom Once installed there will be a bar graphat the top of the browser showing a version of PageRank for the page yoursquorebrowsing When you hold the mouse over the bar you see a number from zero toten (If you donrsquot see the number you may have an older version of the toolbarinstalled You will need to completely uninstall it reboot your computer andreinstall the latest version Once this is done you should be able to see thePageRank number)

How accurate is the Google toolbar

The Google toolbar is not very accurate in showing you the actual PageRank of asite but itrsquos the only thing right now that can give you any idea As long as youknow the toolbarrsquos limitations then at least you know what you are viewing

There are two limitations to the Google toolbar

1 The toolbar sometimes guesses If you enter a page which is not inits index but where there is a page that is very close to it in Googlersquosindex then it will provide a guesstimate of the PageRank This

4

guesstimate is worthless for our purposes because it isnrsquot featured inany of the PageRank calculations The only way to tell if the toolbar isa guesstimate is to type the URL into the Google search box and see ifthe page shows up in the SERPS If it doesnrsquot then the toolbar isguessing

2 The toolbar is just a representation of actual PageRank WhilstPageRank is linear Google has chosen to use a non-linear graph toportray it So on the toolbar to move from a PageRank of 2 to aPageRank of 3 takes less of an increase than to move from aPageRank of 3 to a PageRank of 4 A comparison table best illustratesthis phenomenon The actual figures are kept secret so wersquoll just useany figures for demonstration purposes

If the actualPageRank is

betweenThe Toolbar Shows

000000001 and 5 16 and 25 2

25 and 125 3126 and 625 4

626 and 3125 53126 and 15625 6

15626 and 78125 778126 and 390625 8

390626 and 1953125 91953126 and infinity 10

The PageRank shown in the Google directory (httpdirectorygooglecom) suffers fromthe same problems The PageRank shown in the directory is also on a differentscale There have been attempts to cross-reference these two scales butbecause they are non-linear the results really do not tell you anything more thanyou already know

Also of note is that a programmer managed to generate a tool to look upPageRank without using Internet Explorer This tool has since been withdrawnbut whilst originally the numbers given by this software and Googlersquos toolbarmatched - presently querying with such software sometimes produces differentnumbers than querying with the toolbar This is Googlersquos right to protect theirdata but is the strongest indication that

5

Sometimes what you see on the Google toolbar may not be related to actualPageRank at all (Google can and does assign whatever toolbar PageRankvalue they want to assign to a page)

How significant is PageRank

The significance of any one factor in search engine algorithms depends on thequality of the information it supplies A factorrsquos importance is known as its weightTo demonstrate how weighting is arrived at its easiest to move away fromPageRank for a second and look at Meta tags Originally when the Metakeyword tag was new you could write something like this in your document

ltmeta name=rdquokeywordsrdquo content=rdquopagerank pagerank uncovered algorithmalgorithmsrdquogt

In theory the Meta keyword tag was a very good indicator of what the page wasabout However as most are well aware ndash the weighting for the keywords tag isfast approaching nothing Two things have contributed to this

1 The ease at which Webmasters can manipulate it2 The level of manipulation by Webmasters

These two things are separate factors but with human nature being what it isthe easier something is to influence ndash the more it is manipulated Thecombination of these factors determines the ldquoweightingrdquo ndash ie how much we trustthe information provided by that factor

So it makes sense to look at these factors in relation to PageRank first

PageRank is without doubt one of the hardest things for a Webmaster tomanipulate ethically However it is possible to generate links to your site fromother sites fairly simply through the use of link farms and guestbooks Googlefrowns upon this kind of abuse and many sites that have tried this have had theirPageRank influence blocked But it must be said that the abuse is still rampantand that it can have an influence on PageRank So whilst not easy to doPageRank is still subject to manipulation

The extent to which PageRank is manipulated has also changed Most people nolonger believe Googlersquos old line of people not being able to influence PageRankand the results based on it However there is more information about PageRankavailable than ever and people are more aware of manipulation techniques

So whilst PageRank is valuable you should be careful not to over-estimate itsusage and capabilities Your final ranking in Google is due to a mix of factors of

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 5: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

4

guesstimate is worthless for our purposes because it isnrsquot featured inany of the PageRank calculations The only way to tell if the toolbar isa guesstimate is to type the URL into the Google search box and see ifthe page shows up in the SERPS If it doesnrsquot then the toolbar isguessing

2 The toolbar is just a representation of actual PageRank WhilstPageRank is linear Google has chosen to use a non-linear graph toportray it So on the toolbar to move from a PageRank of 2 to aPageRank of 3 takes less of an increase than to move from aPageRank of 3 to a PageRank of 4 A comparison table best illustratesthis phenomenon The actual figures are kept secret so wersquoll just useany figures for demonstration purposes

If the actualPageRank is

betweenThe Toolbar Shows

000000001 and 5 16 and 25 2

25 and 125 3126 and 625 4

626 and 3125 53126 and 15625 6

15626 and 78125 778126 and 390625 8

390626 and 1953125 91953126 and infinity 10

The PageRank shown in the Google directory (httpdirectorygooglecom) suffers fromthe same problems The PageRank shown in the directory is also on a differentscale There have been attempts to cross-reference these two scales butbecause they are non-linear the results really do not tell you anything more thanyou already know

Also of note is that a programmer managed to generate a tool to look upPageRank without using Internet Explorer This tool has since been withdrawnbut whilst originally the numbers given by this software and Googlersquos toolbarmatched - presently querying with such software sometimes produces differentnumbers than querying with the toolbar This is Googlersquos right to protect theirdata but is the strongest indication that

5

Sometimes what you see on the Google toolbar may not be related to actualPageRank at all (Google can and does assign whatever toolbar PageRankvalue they want to assign to a page)

How significant is PageRank

The significance of any one factor in search engine algorithms depends on thequality of the information it supplies A factorrsquos importance is known as its weightTo demonstrate how weighting is arrived at its easiest to move away fromPageRank for a second and look at Meta tags Originally when the Metakeyword tag was new you could write something like this in your document

ltmeta name=rdquokeywordsrdquo content=rdquopagerank pagerank uncovered algorithmalgorithmsrdquogt

In theory the Meta keyword tag was a very good indicator of what the page wasabout However as most are well aware ndash the weighting for the keywords tag isfast approaching nothing Two things have contributed to this

1 The ease at which Webmasters can manipulate it2 The level of manipulation by Webmasters

These two things are separate factors but with human nature being what it isthe easier something is to influence ndash the more it is manipulated Thecombination of these factors determines the ldquoweightingrdquo ndash ie how much we trustthe information provided by that factor

So it makes sense to look at these factors in relation to PageRank first

PageRank is without doubt one of the hardest things for a Webmaster tomanipulate ethically However it is possible to generate links to your site fromother sites fairly simply through the use of link farms and guestbooks Googlefrowns upon this kind of abuse and many sites that have tried this have had theirPageRank influence blocked But it must be said that the abuse is still rampantand that it can have an influence on PageRank So whilst not easy to doPageRank is still subject to manipulation

The extent to which PageRank is manipulated has also changed Most people nolonger believe Googlersquos old line of people not being able to influence PageRankand the results based on it However there is more information about PageRankavailable than ever and people are more aware of manipulation techniques

So whilst PageRank is valuable you should be careful not to over-estimate itsusage and capabilities Your final ranking in Google is due to a mix of factors of

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 6: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

5

Sometimes what you see on the Google toolbar may not be related to actualPageRank at all (Google can and does assign whatever toolbar PageRankvalue they want to assign to a page)

How significant is PageRank

The significance of any one factor in search engine algorithms depends on thequality of the information it supplies A factorrsquos importance is known as its weightTo demonstrate how weighting is arrived at its easiest to move away fromPageRank for a second and look at Meta tags Originally when the Metakeyword tag was new you could write something like this in your document

ltmeta name=rdquokeywordsrdquo content=rdquopagerank pagerank uncovered algorithmalgorithmsrdquogt

In theory the Meta keyword tag was a very good indicator of what the page wasabout However as most are well aware ndash the weighting for the keywords tag isfast approaching nothing Two things have contributed to this

1 The ease at which Webmasters can manipulate it2 The level of manipulation by Webmasters

These two things are separate factors but with human nature being what it isthe easier something is to influence ndash the more it is manipulated Thecombination of these factors determines the ldquoweightingrdquo ndash ie how much we trustthe information provided by that factor

So it makes sense to look at these factors in relation to PageRank first

PageRank is without doubt one of the hardest things for a Webmaster tomanipulate ethically However it is possible to generate links to your site fromother sites fairly simply through the use of link farms and guestbooks Googlefrowns upon this kind of abuse and many sites that have tried this have had theirPageRank influence blocked But it must be said that the abuse is still rampantand that it can have an influence on PageRank So whilst not easy to doPageRank is still subject to manipulation

The extent to which PageRank is manipulated has also changed Most people nolonger believe Googlersquos old line of people not being able to influence PageRankand the results based on it However there is more information about PageRankavailable than ever and people are more aware of manipulation techniques

So whilst PageRank is valuable you should be careful not to over-estimate itsusage and capabilities Your final ranking in Google is due to a mix of factors of

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 7: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

6

which PageRank is only one Wersquoll get into more details later by discussing howPageRank is different than the other ranking factors and thus when it appliesand when it doesnrsquot Ironically enough PageRankrsquos weighting factor isundeniably declining Since the original version of this paper gave out detailedinformation about PageRank in all likelihood it may also have contributed insome small way to the decline in weighting of the very subject it talks about

Is PageRank a good determination of the quality of a page

To examine the worth of PageRank we need to first look at its premise and howaccurate it is Basically PageRank says

1 If a page links to another page it is casting a vote which indicates that theother page is good

2 If lots of pages link to a page then it has more votes and its worth shouldbe higher

The basic implication here is People only link to pages they think are good

It shouldnt be hard to convince you that this premise is wrong A few of thereasons people link to pages other than ones they think are good are

1) Reciprocal links ndash ldquoLink to me and Irsquoll link to yourdquo

2) Link Requirements ndash ldquoUsing our script requires you to put a link to ourpagerdquo or ldquoWersquoll give you an award solely because you link to our pagerdquo

3) Friends and Family ndash ldquoThis is my friend Petersquos siterdquo or ldquoMy mumrsquos siteis here my dadrsquos site is here My dogrsquos site is hererdquo

4) Free Page Add-ons ndash ldquoThis counter was provided bywwwlinktocountersitecomrdquo

Furthermore anybody who has a top-ranking site will tell you that it tends to getlinks from new sites This is not necessarily because its good (although theygenerally are) Assume a Webmaster is setting up a new site and they arelooking for some outbound links Nowadays one of the first things they do is aGoogle search for similar sites The links they end up with may not necessarilybe the best sites but merely the easiest ones to find If PageRank influencesrankings and if they subsequently link to those pages ndash the new Webmaster willbe adding to the inaccuracies in the judging of the quality of a page The same istrue when these new Webmasters use the Google Toolbar PageRank indicator tochoose whom to link to

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 8: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

7

To put this another way

PageRank is determined by the links pointing to a page But if PageRank itselfhas an influence on the number of links to a page it is influencing (in acircular way) the quality of that page The links are no longer based solely onhuman judgement If a Webmaster picks their outbound links by searching onGoogle or by looking at the Google toolbar (even if this is only part of theirjudgement) then there is a corresponding increase in a pagersquos PageRankThis increase is not solely because it is a good page but because itsPageRank is already high

Googlersquos First 1000 Results

Remember PageRank alone cannot get you high rankings Wersquove mentionedbefore that PageRank is a multiplier so if your score for all other factors is 0 andyour PageRank is twenty billion then you still score 0 (last in the results) This isnot to say PageRank is worthless but there is some confusion over whenPageRank is useful and when it is not This leads to many misinterpretations ofits worth The only way to clear up these misinterpretations is to point out whenPageRank is not worthwhile

If you perform any broad search on Google it will appear as if youve foundseveral thousand results However you can only view the first 1000 of themUnderstanding why this is so explains why you should always concentrate onldquoon the pagerdquo factors and anchor text first and PageRank last

Assume that you perform a search on Google and it returns 200000 results Ifwe were to calculate every factor for each 200000 pages ndash do you think it wouldreally take just 034 seconds to search The answer to speeding up the search isto get a subset of documents that are most likely to be related to the query Thissubset of documents needs to be larger than the number of search results Forexample letrsquos say that number is 2000 What the search engine does is querythe whole database using 2 or 3 factors finding the 2000 documents that rankhighest for them (Remember there were 200000 possible documents andthatrsquos the number that actually gets shown) Then the engine applies all thefactors to those 2000 and ranks them accordingly Because therersquos a drop in thequality of the results (not the pages) at the bottom of this subset the engine justshows the first 1000 PageRank is almost certainly not one of those factorsNotice how before we highlighted the word ldquorelatedrdquo in creating the subset of2000 pages The search engine is looking for pages that are on-topic If weincluded PageRank in that list wersquod get a lot of high PageRank pages with topicsthat are only slightly related (because of the second factor) but thatrsquos not whatwe want

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 9: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

8

Why this is critical

You must do enough on the page work andor anchor text work to get intothat subset of 2000 pages for your chosen key phrase otherwise your highPageRank will be completely in vain PageRank means nothing if you do nothave enough ranking from other factors to make it into the first subset

The difference between PageRank and other factors

To assess when PageRank is important and when it is not we need tounderstand how PageRank is different from all other ranking factors To do thisherersquos a quick table that lists a few other factors and how they add to the rankingscore

Title Tag Can only be listed onceKeywords in Body Text Each successive repetition is

less important Proximity isimportant

Anchor Text Highly weighted but likekeywords in body text there isa cut off point where furtheranchor text is no longerworthwhile

PageRank Potentially infinite You arealways capable of increasingyour PageRank significantlybut it takes work

All other ranking factors have cut off points beyond which they will no longeradd to your ranking score or will not add significantly enough for it to beworthwhile PageRank has no cut off point

Non-PageRank Factor Threshold

Having written about the difference between PageRank and other factors andhow PageRank is harder to get it should now be clear that whilst we could usemany methods to get good rankings there is a threshold which defines whenhigh PageRank is worth striving for and when it is not

With ranking factors other than PageRank there is a score beyond which theslow down in the rate that any factor adds to this score is so insignificant that it is

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 10: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

9

not worthwhile This is the Non-PageRank Factor Threshold To illustrate thisletrsquos put an example figure on this of 1000

If we have a query where the results are Page A and Page B then Page A and Bhave scores for that query which are the total scores for all ranking factors(including PageRank) Letrsquos say Page Arsquos score is 900 and page Brsquos score is500 Obviously Page A will be listed first These are both below our hypotheticalNon-PageRank Factor Threshold thus without any change in PageRank it ispossible for page B to improve their optimization to beat Page A for this particularquery There are lots of queries like this on Google theyre more commonlythought of as less competitive queries

Now assume Page A raises its score to 1100 Suddenly page B cannot competein the SERPs (search engine results pages) without increasing its PageRank Inall probability page B must also improve for all the other ranking factors but anincrease in PageRank is almost certainly necessary There are also lots ofqueries like this on Google which are more commonly thought of as morecompetitive queries

Generally when querying Google the group of pages in the SERPs will containsome pages that have a score above the Non-PageRank Factor Threshold andsome that do not

There is an important point to be made here

To be competitive you must raise your pages search engine ranking scorebeyond the Non-PageRank Factor Threshold To fail to do so means that youcan easily be beaten in the search results for your query terms The quickestway to approach the Non-PageRank Factor Threshold is through ldquoon the pagefactorsrdquo however you cannot move above the Non-PageRank FactorThreshold without PageRank

The obvious question is whatrsquos the numerical value of the Non-PageRank FactorThreshold and how much work do you need to do to get past it The answer isthat it has no value it is a hypothetical line Google could put a value on it butthat would not help us unless we know what the pagersquos individual scores are Weneed only be aware that the threshold exists and that it gives us informationabout principles

Using the threshold to derive the worth of two ranking strategies

The threshold explains principles and the different ways that search enginemarketers work It also demonstrates why some of the misunderstandings aboutPageRank occur Letrsquos consider the strategies of two people Person A considersPageRank to be unimportant and Person B considers PageRank to be veryimportant

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 11: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

10

Person A says ldquoPageRankrdquo is unimportant They have optimised pages foryears and know how to use ldquoon the pagerdquo factors very successfully Theyunderstand the basics of anchor text but they couldnrsquot care at all aboutPageRank

Whatrsquos happening person A is reaching the Non-PageRank Factor Thresholdvery quickly because they are maximising the ldquoon the pagerdquo factors Throughcarefully choosing keywords they jump-start themselves up the SERPs Aslong as their content is good high-ranking sites (over time) tend to get linkedto Whilst they didnrsquot directly ask for it a slow trickle of sites will begin to link tothem and give them PageRank which helps consolidates their position

Person B says ldquoPageRankrdquo is important Wersquove all seen those pages in theresults that have no content but great rankings (With big brands this canoften occur naturally even when they have no idea what PageRank is Thiswould be Person C who is not relevant to the discussion at hand) Person Bunderstands lots about PageRank and concentrates heavily on it

Whatrsquos happening person B is doing the reverse of person A Whilst person Aconcentrated on the Non-PageRank factors and found herself gettingPageRank anyway person B concentrates on the PageRank Factor and findshimself getting Non-PageRank factors The reason for this is that increasingPageRank requires links and links have anchor text Thus through carefullychoosing the anchor text linking to his page person B automatically increaseshis Non-PageRank factor scores whilst obtaining his high PageRank score

Obviously these are two extremes but we can use these to extrapolate theadvantages and disadvantages of each approach

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 12: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

11

Advantages DisadvantagesPerson A bull Quick entry to the

results pagesbull Self-generating links

limit the amount ofwork needed

bull Securing the resultsto make it moredifficult forcompetitors tocompete takes morework

bull Slower to react tonew competitors

Person B bull Solid position Caneasily modify on thepage factors for aquick boost ifrequired

bull Probably highertraffic from non-search enginesources

bull Slower entry toresults pages

bull Difficult to do wellbull Increased likelihood

of tripping spamfilters

It is clear that both strategies can and do work Both strategies are usingPageRank as part the mix of factors that will ultimately improve their ranking inthe SERPS Because there is such a mix we can use them to different degreesdepending on the strategy that best suits your style My personal strategy is touse a combination but to save some of the ldquoon the pagerdquo factors for later in caseI need a quick boost if the competition heats up at a later time

Really heavy competition

No explanation of PageRank strategies would be complete without a finalstatement regarding heavy keyword competition There are some queries wherecompetition is so intense that you must do everything possible to maximize yourranking score (for example ldquoweb hostingrdquo) In such situations it is impossible torank highly through Non-PageRank factors alone (as you will not initially be listedhigh enough to be noticed and linked to) That is not to say that Non-PageRankfactors arent important Consider what your final rank score is

Final Rank Score = (score for all Non-PageRank factors) x (actual PageRankscore)

Improving either side of the equation can have a positive effect Howeverbecause the Non-PageRank factors have a restricted maximum benefit theactual PageRank score must be improved in order to compete successfullyUnder really heavy competition ndash it holds true that you cannot rank well unlessyour actual PageRank score is above a certain level In other words

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 13: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

12

There exists a query specific ldquoMinimum PageRank levelrdquo For queries that do nothave heavy competition this level is easy to achieve without even tryingHowever where heavy competition exists Non-PageRank factors are just asimportant (and easier to get) until they reach the Non-PageRank factor thresholdThis is why careful keyword choice can help you avoid the extensive workassociated with highly competitive search phrases

How PageRank is calculated

On a simple level we can tell quite a lot about how PageRank is calculated Thisis because when Google was just a university research project the creators ofPageRank published a paper that detailed a formula for calculating it Thisformula is now more than a handful of years old and we suspect that it haschanged somewhat since then However for detailing the over-riding principlesof how PageRank works it is as accurate today as the day it was written

PR(A) = (1-d) + d (PR(T1) C(T1) + + PR(Tn) C(Tn) )

Where PR(A) is the PageRank of Page A ( the one we want t owork out )

D is a dam pening factor Nom inally this is set to 085

PR(T1) is the PageRank of a site point ing to Page A

C(T1) is the num ber of links off that page

PR(Tn) C(Tn) m eans we do that for each page point ing toPage A

Source The Anatomy of a Large-Scale Hypertextual Web Search EngineSergey Brin and LawrencePage httpwww-dbstanfordedu~backrubgooglehtml

Couldnrsquot be simpler right Depending on your mathematical competence this iseither remarkably easy to understand or remarkably complex So herersquos the dealndash this isnrsquot a one-time calculation It looks pretty above but you canrsquot just do onesimple calculation and get an answer with the PageRank of a page If you look atit to calculate the PageRank of Page A we need to know the PageRank of allthe pages pointing to it To calculate the PageRank of those pages we need toknow the PageRank of all the pages pointing to them (one of which could and isquite likely to be Page A) If this seems like it could go around in circles forever

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 14: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

13

thatrsquos because it nearly does We have to do a whole lot of little calculationsmany times over before we actually get the answer What this formula tells us isthat however you slice it and however the formula may have changed since itwas written

The PageRank given to Page A by a Page B pointing to it is decreased witheach link to anywhere that exists on Page B This means that a pagersquos

PageRank is essentially a measure of its vote it can split that vote between onelink or two or many more but its overall voting power will always remain the

same

This really is important So to be totally clear letrsquos put some numbers to it (thesenumbers are made up for demonstration purposes only and have no relevancyto any particular page) Say Page B has a PageRank value of 5 and has a singlelink on it pointing to Page A Page Arsquos PageRank is improved by a proportion ofPage Brsquos value of 5 (Page B doesnrsquot lose anything but Page A gains) If Page Bhas two links that PageRank improvement would be split and Page A wouldonly gain half the PageRank that it did before

Now put the formula out of your mind for a moment as its easier to understandhow it works using a diagram Letrsquos say we have a hypothetical set of pagesimaginatively titled Page A Page B Page C and Page D They link to each otheras shown below

To begin with in our example at least we donrsquot know what the pagersquos startingPageRanks are Therersquos nothing special about the number that we pick to startwith (in fact if you read forward to the section on convergence ndash you can see that

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 15: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

14

we can start at any number we want) Since in the last version of this paper weperformed the calculations by setting these values to 1 ndash wersquore going to set themto zero this time around in order to prove that it doesnrsquot matter what the startingvalues are

Next we perform the necessary calculation to obtain the PageRank for eachpage The rules are

1 We take a 085 a pagersquos PageRank and divide it by the number oflinks on the page

2 We add that amount on to a new total for each page itrsquos being passedto

3 We add 015 to each of those totals

The first calculation is easy Because wersquove started at zero ndash 0 085 is always 0So each page gets just 015 + 0 Meaning each page now has a PageRank of015 Clearly wersquore not done ndash we want to show the importance of each pagebased upon links and theyrsquore all the same so we need to run the calculationagain

Page A links to pages B C and D Page Arsquos PageRank is 015 so it will add 085 015 = 01275 to the new PageRank scores of the pages it links to There arethree of them so they each get 00425

Page B links to page C Page Brsquos PageRank is 015 so it will add 085 015 =01275 to the new PageRank score of the pages it links to Since it only links topage C page C will get it all

Page C links to Page A all 01275 passes to page A

Page D links to Page C Again all 01275 passes to page C

The new totals for each page them become

Page A 015 (base) + 01275 (from Page C) = 02775Page B 015 (base) + 00425 (from Page A) = 01925Page C 015 (base) + 00425 (from Page A) + 01275 (from Page B) + 01275(from Page D) = 04475Page D 015 (base) + 00425 (from Page A) = 01925

So wersquove got

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 16: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

15

Pretty neat huh Already wersquore begining to see that Page C is probably the mostimportant page in the system (but we canrsquot be sure yet ndash it could well change)We carry on doing these calculations until the value for each page no longerchanges (this is called convergence ndash therersquos more about this in the nextsection) In practice Google probably doesnt wait for this convergence butinstead run a number of iterations of the calculation which is likely to give themfairly accurate values (more on this later as well) If we carried out all thecalculations for the example given it would take us 143 calculations [ ExcelExample 1] and wersquod reach final values of

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 17: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

16

As suspected Page C is the most important If we take a quick look at these rawvalues we can see something about the number of links pointing out from a pageLook at Page A which has a link from a high PageRank page (Page C) whichhas only one outbound link Then look at Page B and D both share links from ahigh PageRank page (Page A) with three outbound links The number of linkssignificantly alters the way PageRank is distributed

A word on convergence

Convergence is an important mathematical aspect of PageRank which allowsGoogle to provided unprecedented search quality at comparably low costs Thisis a semi-complex topic but it is important to your understanding of how and whyPageRank works Wersquove tried to make it as simple as possible but unless yoursquoreSergey Brin or Larry Page yoursquoll still need to concentrate

Whilst it will take some concentration it isnrsquot that hard to understand

Wersquove shown that the outlet values (final values) of one stage of the calculationbecome the inlet values (starting values) of the next stage and that we keep ondoing this (itrsquos known as a recursive procedure) But the really big question ishow and when does the recursive procedure stop

The answer is ldquoconvergencerdquo Provided the dampening factor (d in our equation)is less than one then convergence will occur Nominally we set it to 085(because thatrsquos the value mentioned in the Stanford papers)

This convergence basically means that whatever values we start at after runningthe calculation a number of times we will end up with the same final values andthat these values will no longer change if we do further iterations of thecalculation These final values are known as limiting values

Once the limiting values have been reached Google no longer needs to expendprocessing power on calculating the PageRank They can finish there

This is easier to understand with an example Letrsquos take a look at the followingstructure

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 18: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

17

PageRank for pages A B C D at various stages of iteration

A B C DIteration 1 1 1 1

1 10000000000 05750000000 22750000000 015000000002 20837500000 05750000000 11912500000 015000000003 11625625000 10355937500 16518437500 01500000000

19 14900124031 07833688246 15766187723 0150000000020 14901259564 07832552713 15766187723 0150000000021 14901259564 07833035315 15765705121 0150000000046 14901074052 07832956473 15765969475 0150000000047 14901074054 07832956472 15765969474 0150000000048 14901074053 07832956473 15765969474 0150000000049 14901074053 07832956473 15765969474 0150000000050 14901074053 07832956473 15765969474 0150000000051 14901074053 07832956473 15765969474 0150000000052 14901074053 07832956473 15765969474 0150000000053 14901074053 07832956473 15765969474 0150000000054 14901074053 07832956473 15765969474 01500000000

No matter how much we try to calculate the value past the 48th iteration thevalues do not change ndash they have converged at the limiting values

In practice theres no need to wait until they do not change at all before we stopcalculations We simply wait until they do not change very much In our exampleabove we have done that to 10 decimal places Ie we have calculated until theydo not change by 00000000001 or more (or mathematically | PR 48 (A) - PR 50(A) | lt 10 ndash10)

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 19: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

18

Derived specifics of how Google calculates PageRank

We started our values from zero and now see that the calculations are fairlycomplicated and resource intensive So two questions arise When Googlecalculates PageRank themselves do they start each month with an arbitraryvalue And secondly do they calculate PageRank across several machines Wecan answer these questions by a couple of simple experiments Assume thathaving calculated the PageRanks in the structure shown earlier we would like toadd a page and change a link to make the structure look like this

If we start each page from an arbitrary starting value of one then how manyiterations of the calculation does it take to achieve convergence [ ExcelExample 2]

75

Now if we start each page from the values we ended up with when we calculatedthe PageRank of the original structure (and set Page Ersquos starting value to 1) thenhow many iterations does it take to achieve convergence [ Excel Example 3]

78

These numbers are really very close Therefore it would appear logical thatGoogle does not reset the values with each calculation as they may simply carryon from where they left off

So do they calculate across several machines or one Obviously there is toomuch work for one machine when you get to billions of pages so several

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 20: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

19

machines must be used Logically how can they do this Assume Pages A andC are on one machine and Pages B C and D are on another If we ignore thelinks between pages across machines (ie A-gtD and D-gtC) then for Pages Aand C it takes

1 calculation to converge at A=1 C=1

Then for Pages B D and E it takes

3 calculations to converge at B=015 D=03954375000 E=03954375000

If we then do calculations for all pages using these as starting values it takes

A huge 146 iterations to converge [ Excel Example 4]

The conclusion is that splitting PageRank calculations up into chunks for differentmachines is not a viable proposition Therefore they must also track PageRanktransfers across links between machines This means that each machine mustnotify the other machines as PageRank is given to a page not on that machineIndeed each machine must also run the iterations in sync with each other Evenif they donrsquot do it exactly this way the calculation of PageRank for so manypages is very very complicated indeed

PageRank Feedback how linking sometimes helps

When the new term PageRank Feedback was presented in the original version ofthis paper it was largely taken up and used ndash although not always correctlyPageRank Feedback is difficult to understand because it is a concept that impliescreation of PageRank However it is really an alteration of distribution ofPageRank on a large scale PageRank Feedback is the principle that explainshow sometimes it is good to link to outside pages Consider we have a Page -gtPage A Its PageRank by definition will be 015 Now assume it links out toPage B which in return links back to it Page Arsquos PageRank now becomes 1Now assume it links out to Page C and Page C links back to it Page ArsquosPageRank now becomes 14594594595

The very fact that Page A links out to a page that links back to it causes its ownPageRank to rise (This could also occur if Page A links to a page that links to apage that links to Page A etchellip) This has occurred not because we aregenerating PageRank but because it is taking it from the system as a whole Butif we view this very small system ldquoPage Ardquo (which is a subsystem of all the pagesin the index) then we can firmly say that linking out from Page A has generatedPageRank in this system ndash that is the very act of linking out has fed PageRankback in

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 21: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

20

Letrsquos put this diagrammatically using a slightly more complicated system

Pages A to E are all of the pages in Googlersquos index Pages A and B are your siteWhen we do the calculations for this set up the final PageRanks for each pageare all 1 [ Excel Example 5] As far as your site is concerned

Page Arsquos PageRank is 1

Page Brsquos PageRank is 1

The sum of the PageRank of all the pages in your site is2

Changing the structure to link A to C and C to A gives the following finalPageRanks [ Excel Example 6]

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 22: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

21

Page Arsquos PageRank is 13599321536

Page Brsquos PageRank is 07279711653

The sum of the PageRank of all the pages in your site is20879033189

This increase is small the amount will of course vary depending oncircumstances But the point was easily proved ndash linking out can sometimesimprove your PageRank through a mechanism known as PageRank Feedback

If we added an extra page into the loop between C -gt D and E then the totalincrease in PageRank in your site would be 21462030505 [ Excel Example 7]This suggests that it is wiser to reciprocal link with larger sites that employ a linkstructure where a higher amount of focus is given to the page on which your linkappears (or will appear) The number of pages and the structure you are linkingto has significant influence on the amount of PageRank feedback that occurs

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 23: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

22

Influencing the results

The question that arises from all this is how and when can or will Googleinfluence the results of the PageRank calculation In this respect the situationhas changed significantly since the original PageRank paper Google has shownthat they can and will modify the data on which PageRank is based The primaryexample of this is what has become known as PageRank Zero (PR0) Basicallyspeculation says that when Google wants to penalize a page it is assigned aPageRank of zero As PageRank is a multiplier this will obviously always listPR0 pages as the very last entry in the SERPS

Your score from other factors multiplied by your PageRank score of zero willalways be zero (the lowest score possible)

How they do this is important Letrsquos say they set the PageRank of the page tozero before doing the calculations ndash would this work From what we know so farwe would think not [ Excel Example 8] And true enough it doesnt Why is thissignificant Because the PageRank must be set to zero at the end of theprocess Having a PageRank of zero in itself does not disqualify a page fromvoting exactly as it would have done in the first place

To stop its voting power the second penalty must also be applied This is thesame penalty that is applied to link farms Google has shown that they arecapable of ignoring links they believe have been artificially created This is simpleenough they merely remove the link entry from the matrix This second penaltywould also need to be applied to a PR0 penalty to stop the links from that pagefrom counting in the PageRank process However there doesnrsquot appear to beany way to test this I suspect that both penalties would be applied

Controlling PageRank

As Webmasters contrary to the suggestions of some we have a fair amount ofcontrol over PageRank It is however the most difficult of ranking factors tocontrol and you might well be able to achieve the desired effects by using othertechniques However a well-worked PageRank added to your optimization mixcan make it difficult for competitors to catch up

There are three fundamental areas to look at when trying to optimize thePageRank for your site

1 The links you choose to have link to you ie which ones you chooseand how much effort you put in to getting them

2 Who you choose to link out to from your site and from which page ofyour site you place their link (Maximising PageRank Feedback andminimising PageRank leakage)

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 24: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

23

3 The internal navigational structure and linkage of your pages in order tobest distribute PageRank within your site

Links to Your Site

When looking for links to your site from a purely PageRank point of view onemight think you should simply look for pages that have the highest ToolbarPageRank (Whilst keeping in mind that every page of a site has its ownPageRank so you must consider the PageRank of the links page or whateverpage the actual link will exist on) However this way of thinking is incorrect Ifyoursquove not just jumped to this section then yoursquoll probably have worked out whythat is The PageRank given by a link is far more complicated than thissimplification There may have been a time when that was an okayapproximationhellipbut no more As more and more people try to get links from onlyhigh PageRank sites it becomes less and less of a winning proposition

The actual PageRank from an individual page is shared out amongst the links onthat page (remember the PageRank calculations) So links from pages thathave the same PageRank arent always created equal It depends on how manyother links your link is sharing the links page with For instance a link from apage with a PageRank of 4 might be better than a link from a page with aPageRank of 6 if there are less total links on the PR 4 page Right now there justisnrsquot enough information available to allow us to know to what extent thisstretches However itrsquos significant enough to make it pointless to just choosehigh PageRanked sites as your main linking strategy Theres also another morematter-of-fact reason why that type of linking strategy might not be the best siteswith high PageRanks are often fussy about which sites they will link out tomaking them harder to get linked from than lower PageRanked sites Howeversites struggling with their own PageRank numbers should be more receptive toexchanging reciprocal links with other like-minded sites

Now lets factor in feedback Letrsquos say for example that there are two separatepages on other peoplersquos sites which both have a PageRank of 4 Both of thesehave ten links to other pages But the page on your site that you want them tolink to already has a link to the page on the second site By getting a link fromthe second site yoursquore generating feedback and getting more PageRank than ifyou had gotten a link from the first site Thatrsquos an over simplification in factfeedback loops can get even more complicated Remember the number of linkson the page linking to you will alter the amount of feedback etc

Can you work it all out for a given pagersquos situation No ndash neither can I Myadvice therefore is this ndash get links from sites that seem appropriate and havegood quality regardless of their current PageRank If they are relevant to yoursite and are high quality sites they will either help your PageRank now or willdo so in the future To really get your PageRank humming get yourself a listing

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 25: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

24

in DMOZ and Yahoo and enjoy the artificially enhanced PageRank that theyprovide

Links out of your site

When considering links out of your site there is one golden rule

Generally you will want to keep PageRank within your own site

This does not mean that you will lose PageRank from your site by linking out butthat the total PageRank within it may be lower than it could have been had younot linked out From this we can derive another rule which gives away as littlePageRank as possible (meaning that we are giving our own pages as muchPageRank as possible)

The best outbound link PageRank scenario occurs whenthe outbound link comes from a page that has both

a) A low PageRank

and

b) A lot of links to pages on your site

How can this best be achieved One way would be by writing reviews of thesites we link out to on a separate page of our site and by providing a link tothose reviews along with each hyperlink to the external site Optionally it wouldbe okay if these pages open in another window but DO NOT open these pagesusing javascript because the spiders cannot yet follow this

For example we can do something like this with each link to an external sitelta href=httpwwwsupportforumsorggtThe best search engine resource andforum site in the worldltagt lta href=rdquosfreviewhtmrdquo target=rdquo_newrdquogtRead myflattering review of them hereltagt

Make sure that the review page also links back to a page in your own site that ishigh up its structure (Itrsquos best if this is your home page but any important pagewill do) By doing this yoursquove significantly reduced the amount of PageRankyoursquove let out of your site Wersquove targeted the distribution of PageRank to thehome page to ensure that less is passed back through your links page (whichwould be a wasted opportunity) and more is put elsewhere in your site Yourlinks page also needs to link to your home page and the other major pages ofyour site However place no other links on the review page (besides the homepage link)

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 26: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

25

Itrsquos very good if someone links to your review page so in addition you may letthe site youve linked to know that you have reviewed them ndash it is quite possibleyou will get two links from their site (one to your site and one to the review oftheir site) This is all very complicated in text form so heres a simplified exampleto show the principle and its effect

Doing all the calculations for PageRanks we converge at the following values [Excel Example 9]

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 27: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

26

If we perform the same calculations but include the review pages herersquos what weget

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 28: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

27

Just to be very clear here (and because in earlier versions of this paper somequestioned the validity of this technique) if we look at only pages A B C and Dthen

Without the Review Pages With the Review PagesThe HomePage PR is09536152797

BCD PR is 04201909959

Total is 22141882674

The HomePage PR is2439718935

BCD PR is 08412536982

Total is 49634800296

Some of this is the power of extra pages some of it is feedback and some of it isdrawing PageRank away from your outbound links pages Fundamentallyhowever adding extra internal links to your outbound pages is one of the most (ifnot the most) important internal factors to improve your PageRank This boostmay not be as much as from getting a new outside link to your site but itrsquos alsomuch easier and more beneficial for your sitersquos readers

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 29: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

28

Internal Structures and Linkages

Having spoken about ldquoLinking outrdquo to external sites the next logical progressionis to discuss internal links within your site If we recognise that PageRank isreally about pages having a vote then straight away we can derive one importantthing about internal link structure and PageRank

A page that is in the Google index has a vote however small Thus the morepages you have in the index ndash the more overall vote you are likely to have Orsimply put bigger sites tend to hold a greater total amount of PageRank withintheir site (as they have more pages to work with)

We need to clarify this a bit more To get a high PageRank it is not enough tohave tens of thousands of pages Those pages must also be in Googlersquos indexTo achieve this they must contain enough content for Googlersquos algorithms toconsider them worthy of being added to the index As you develop content foryour site you are also creating more PageRank for your site Itrsquos hard work andyoursquore creating it slowly ndash but if yoursquore also creating pages that people will want tolink to then yoursquore doing yourself two favours at once PageRank from bothdirections Or to put it in very basic terms

The best ldquointernalrdquo thing you can do to build PageRank is to write lots of goodcontent Ensure that pages arenrsquot overly short or overly long and break thecontent into several pages where necessary

There are three different ways in which pages can be interlinked within a site Inpractice sites might use a combination of these Using a combination isfine and normal as long as you understand the different sections and howthey are affecting your PageRank

For the purposes of this document wersquoll view the different linkage structures asseparate entities We have

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 30: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

29

Hierarchical

Looping

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 31: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

30

Extensive Interlinking

By performing the PageRank calculations for these structures we can see howinternal link structure can distribute PageRank around your site This is criticalbecause whilst at this stage of the document these are closed systems wheninbound and outbound links come into play you will want to be able to controlyour PageRank so that it is in the right places to maximise PageRank feedbackSince we have already extensively shown you the PageRank calculations wersquolljust dive right in with the final value for each structure

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 32: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

31

Hierarchical[Excel Example 11]

Total PageRank in this site is 4

Looping[Excel Example 12]

Total PageRank in this site is 4

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 33: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

32

Extensive Interlinking[Excel Example 13]

Total PageRank in this site is 4

Notice something that is critically important Because there are no inbound oroutbound links we are restricted to having the same total PageRank across allpages in our example site (4) However introducing a hierarchical-like structurecan alter the distribution of the PageRank (Note it does not have to be a stricthierarchy but must contain more of the properties of the hierarchical structurethan the looping or extensive interlinking) This ability to shift PageRank to thepages that we want it to is the Webmasterrsquos easiest way to manipulatePageRank

In general we can state this as the following rule

It is rare that a site will want to have an entirely linking or an entirely extensivelyinterlinked structure By using elements of a hierarchical structure a Webmasteris able to shift PageRank to the pages of his or her choice If PageRank wasspread out evenly across all pages of the site then the Webmaster is very closeto obtaining maximum benefit

The maximum benefit from PageRank is derived when this methodology isapplied towards pages that want to rank high for highly competitive keywordphrases or towards pages that must compete on a large number of keywordphrases To do this the webmaster must move it away from pages that they donot need to rank as highly or from pages that can rank well on the strength ofkeyword phrases alone

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 34: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

33

Closed systems (with no inbound and outbound links) are one thing but letrsquos takea look what happens when we add an inbound and outbound link

Hierarchical[Excel Example 14]

Total PageRank in this site 32220942408

Looping[Excel Example 15]

Total PageRank in this site 26641737744

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 35: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

34

Extensive Interlinking[Excel Example 16]

Total PageRank in this site 35683429515

When a system is non-closed as most are then we can say that the followingholds true

The Extensive Interlinking strategy retains the most PageRank within the siteFollowing this is the Hierarchical strategy and lastly the Looping strategy

Please keep in mind that this is just a principle In practice you will be unlikely toextensively interlink 10000 pages You must also choose the appropriatestructure for sub-sections of your site As we add more pages into thehierarchical structure it becomes more successful This is purely because morelinks are being added to the page that contains the outbound link (it is dividedamongst more pages most of which are yours) However as we add morepages we also dilute the PageRank of the home page (which could be importantto you)

What Google says

The best information about PageRank is of course straight from the horsersquosmouth We therefore felt it would be helpful to ask Google themselves aboutPageRank Obviously we realize Google wonrsquot provide us with the exactspecifics about PageRank or comment on the validity of what weve writtenHowever we did want to ask them some general questions to provide us with an

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 36: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

35

overview of how they see PageRank - now and in the future We hope this thisshort interview will help debunk a few myths and answer some questions

Chris Can you answer the age old question for us Is PageRank namedafter the fact it is page based or is it named after one of the creatorsGoogle PageRank is named for Larry Page Google co-founder andpresident Products

Chris Does Google view PageRank as being a significant distinguishingfeature from the other search enginesGoogle PageRank is a technology which contributes to the speed andrelevance that differentiate Google from other search engines In additionGoogle employs 100 other algorithms to its ranking formula

Chris Does Google believe that PageRank significantly benefits thequality of its results pages and does it expect this to continue in thefutureGoogle Because of the scalable approach PageRank uses to analyzelinks it will continue to be a significant factor in Googlersquos search results

Chris Several people have been known to try to data mine the PageRankinformation given by the toolbar how does Google feel about thisGoogle Mining PageRank data from the Google Toolbar is againstGooglersquos terms of service

Chris How useful do you think the PageRank information is on thetoolbar for a) normal users b) webmasters c) search engine optimizationprofessionalsGoogle The PageRank information on the Google Toolbar is an estimateand is intended only for informational purposes Many users have found itinteresting and webmasters have been known to use it as a measure oftheir performance However because it is a rough estimate the value tosearch engine optimization professionals is limited

Chris One method that people have been known to try to boostPageRank is by using link farms and guestbooks How is Google likely toreact to a site doing thatGoogle Googlersquos engineers are constantly working to update Googlersquosranking algorithm to prevent manipulation of Googlersquos rankings

Final word to non mathematicians

It feels a little strange having a final word when it isnrsquot the end of the documentBut if you read the table of contents yoursquoll see that wersquove added an advancedsection for the techies mathematicians and Phd students If your goal was just tofind out about PageRank and what you can do about it then youre done

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 37: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

36

In summary PageRank is probably the most misunderstood search engineconcepts in existence today There are those who profess to know everythingabout PageRank and there are those who claim to not need to know anythingThere are those who claim PageRank is important and there are those whoclaim it is unimportant There are those who claim Google oversells PageRankand there are those who do not believe this to be the case

The truth of the matter is that nobody except Google can tell you the exactspecifics The authors of this document do not in any way profess to tell youeverything What is presented here are theories based on the evidence availableAll we can do is to derive the principles not exact figures

To quantify the worth of PageRank is an impossible task Under differingcircumstances and differing Webmaster styles it will have a different value WhatPageRank is worth to you is probably an individual thing and you must decidehow it fits into your philosophy and situation You can do well by concentratingheavily on PageRank or by not However you should always keep PageRankand the principles discussed here in the back of your mind when designing sitesJust like you remember to create good Titles and anchor text you shouldremember to consider PageRank

The process of improving your PageRank should always be in symbiosis withyour visitor experience The highest PageRank in the world isnrsquot worth much ifyour visitors canrsquot navigate your site Gaining PageRank should only be donewhilst adding value to your site ndash whether that is through better targeting ofvisitors or improving content

Advanced PageRank topics

As we delve into the advanced topics we move more into the realm ofspeculation Whilst we can be relatively sure that the principles derived earlierare sound for the advanced PageRank discussion we need to place a greateremphasis on assumption

It should be said however that Google has surely thought of everything werediscussing here It is therefore perfectly possible that they have done some ofwhat we suggest below If they have not already then theres a good possibilitythey will in the future

Speeding Up PageRank

It is apparent that Google does not need a 100 accurate value for PageRankso the question arises ndash ldquoCan Google calculate an approximation of the finalvalues without using as many iterationsrdquo The answer appears to be yes Let usexplain In previous calculations we have set the normalisation co-efficient at a

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 38: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

37

fixed amount We can vary this co-efficient for each page depending upon thefinal value of the previous computation using the formula

(1-d) PR i (A)

That is we are replacing the (1-d) of the previous formula with the above PR i(A) simply means with the PR deduced by the previous iteration

Letrsquos do this by example Assume we have the hierarchical structure mentionedearlier

With this very simple structure we arrive at the final values like this

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 12550000000 09150000000 09150000000 09150000000 400000000003 24832500000 05055833333 05055833333 05055833333 400000000004 14392375000 08535875000 08535875000 08535875000 400000000005 23266481250 05577839583 05577839583 05577839583 400000000006 15723490938 08092169688 08092169688 08092169688 400000000007 22135032703 05954989099 05954989099 05954989099 400000000008 16685222202 07771592599 07771592599 07771592599 400000000009 21317561128 06227479624 06227479624 06227479624 40000000000

10 17380073041 07539975653 07539975653 07539975653 4000000000011 20726937915 06424354028 06424354028 06424354028 4000000000012 17882102772 07372632409 07372632409 07372632409 4000000000013 20300212644 06566595785 06566595785 06566595785 4000000000014 18244819253 07251726916 07251726916 07251726916 40000000000

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 39: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

38

15 19991903635 06669365455 06669365455 06669365455 4000000000016 18506881910 07164372697 07164372697 07164372697 4000000000017 19769150376 06743616541 06743616541 06743616541 4000000000018 18696222180 07101259273 07101259273 07101259273 4000000000019 19608211147 06797262951 06797262951 06797262951 4000000000020 18833020525 07055659825 07055659825 07055659825 4000000000021 19491932554 06836022482 06836022482 06836022482 4000000000022 18931857329 07022714224 07022714224 07022714224 4000000000023 19407921270 06864026243 06864026243 06864026243 4000000000024 19003266921 06998911026 06998911026 06998911026 4000000000025 19347223118 06884258961 06884258961 06884258961 4000000000026 19054860350 06981713217 06981713217 06981713217 4000000000027 19303368702 06898877099 06898877099 06898877099 4000000000028 19092136603 06969287799 06969287799 06969287799 4000000000029 19271683888 06909438704 06909438704 06909438704 4000000000030 19119068696 06960310435 06960310435 06960310435 4000000000031 19248791609 06917069464 06917069464 06917069464 4000000000032 19138527133 06953824289 06953824289 06953824289 4000000000033 19232251937 06922582688 06922582688 06922582688 4000000000034 19152585853 06949138049 06949138049 06949138049 4000000000035 19220302025 06926565992 06926565992 06926565992 4000000000036 19162743279 06945752240 06945752240 06945752240 4000000000037 19211668213 06929443929 06929443929 06929443929 4000000000038 19170082019 06943305994 06943305994 06943305994 4000000000039 19205430284 06931523239 06931523239 06931523239 4000000000040 19175384259 06941538580 06941538580 06941538580 4000000000041 19200923380 06933025540 06933025540 06933025540 4000000000042 19179215127 06940261624 06940261624 06940261624 4000000000043 19197667142 06934110953 06934110953 06934110953 4000000000044 19181982929 06939339024 06939339024 06939339024 4000000000045 19195314510 06934895163 06934895163 06934895163 4000000000046 19183982666 06938672445 06938672445 06938672445 4000000000047 19193614734 06935461755 06935461755 06935461755 4000000000048 19185427476 06938190841 06938190841 06938190841 4000000000049 19192386645 06935871118 06935871118 06935871118 4000000000050 19186471352 06937842883 06937842883 06937842883 4000000000051 19191499351 06936166883 06936166883 06936166883 4000000000052 19187225552 06937591483 06937591483 06937591483 4000000000053 19190858281 06936380573 06936380573 06936380573 4000000000054 19187770461 06937409846 06937409846 06937409846 4000000000055 19190395108 06936534964 06936534964 06936534964 4000000000056 19188164158 06937278614 06937278614 06937278614 4000000000057 19190060466 06936646511 06936646511 06936646511 4000000000058 19188448604 06937183799 06937183799 06937183799 4000000000059 19189818686 06936727105 06936727105 06936727105 4000000000060 19188654117 06937115294 06937115294 06937115294 4000000000061 19189644001 06936785333 06936785333 06936785333 4000000000062 19188802599 06937065800 06937065800 06937065800 4000000000063 19189517791 06936827403 06936827403 06936827403 4000000000064 19188909878 06937030041 06937030041 06937030041 4000000000065 19189426604 06936857799 06936857799 06936857799 40000000000

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 40: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

39

66 19188987387 06937004204 06937004204 06937004204 4000000000067 19189360721 06936879760 06936879760 06936879760 4000000000068 19189043387 06936985538 06936985538 06936985538 4000000000069 19189313121 06936895626 06936895626 06936895626 4000000000070 19189083847 06936972051 06936972051 06936972051 4000000000071 19189278730 06936907090 06936907090 06936907090 4000000000072 19189113080 06936962307 06936962307 06936962307 4000000000073 19189253882 06936915373 06936915373 06936915373 4000000000074 19189134200 06936955267 06936955267 06936955267 4000000000075 19189235930 06936921357 06936921357 06936921357 4000000000076 19189149459 06936950180 06936950180 06936950180 4000000000077 19189222959 06936925680 06936925680 06936925680 4000000000078 19189160484 06936946505 06936946505 06936946505 4000000000079 19189213588 06936928804 06936928804 06936928804 4000000000080 19189168450 06936943850 06936943850 06936943850 4000000000081 19189206817 06936931061 06936931061 06936931061 4000000000082 19189174205 06936941932 06936941932 06936941932 4000000000083 19189201926 06936932691 06936932691 06936932691 4000000000084 19189178363 06936940546 06936940546 06936940546 4000000000085 19189198391 06936933870 06936933870 06936933870 4000000000086 19189181367 06936939544 06936939544 06936939544 4000000000087 19189195838 06936934721 06936934721 06936934721 4000000000088 19189183538 06936938821 06936938821 06936938821 4000000000089 19189193993 06936935336 06936935336 06936935336 4000000000090 19189185106 06936938298 06936938298 06936938298 4000000000091 19189192660 06936935780 06936935780 06936935780 4000000000092 19189186239 06936937920 06936937920 06936937920 4000000000093 19189191697 06936936101 06936936101 06936936101 4000000000094 19189187058 06936937647 06936937647 06936937647 4000000000095 19189191001 06936936333 06936936333 06936936333 4000000000096 19189187649 06936937450 06936937450 06936937450 4000000000097 19189190498 06936936501 06936936501 06936936501 4000000000098 19189188077 06936937308 06936937308 06936937308 4000000000099 19189190135 06936936622 06936936622 06936936622 40000000000

100 19189188385 06936937205 06936937205 06936937205 40000000000101 19189189872 06936936709 06936936709 06936936709 40000000000102 19189188608 06936937131 06936937131 06936937131 40000000000103 19189189683 06936936772 06936936772 06936936772 40000000000104 19189188770 06936937077 06936937077 06936937077 40000000000105 19189189546 06936936818 06936936818 06936936818 40000000000106 19189188886 06936937038 06936937038 06936937038 40000000000107 19189189447 06936936851 06936936851 06936936851 40000000000108 19189188970 06936937010 06936937010 06936937010 40000000000109 19189189375 06936936875 06936936875 06936936875 40000000000110 19189189031 06936936990 06936936990 06936936990 40000000000111 19189189324 06936936892 06936936892 06936936892 40000000000112 19189189075 06936936975 06936936975 06936936975 40000000000113 19189189286 06936936905 06936936905 06936936905 40000000000114 19189189107 06936936964 06936936964 06936936964 40000000000115 19189189259 06936936914 06936936914 06936936914 40000000000116 19189189130 06936936957 06936936957 06936936957 40000000000

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 41: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

40

117 19189189240 06936936920 06936936920 06936936920 40000000000118 19189189146 06936936951 06936936951 06936936951 40000000000119 19189189226 06936936925 06936936925 06936936925 40000000000120 19189189158 06936936947 06936936947 06936936947 40000000000121 19189189216 06936936928 06936936928 06936936928 40000000000122 19189189167 06936936944 06936936944 06936936944 40000000000123 19189189208 06936936931 06936936931 06936936931 40000000000124 19189189173 06936936942 06936936942 06936936942 40000000000125 19189189203 06936936932 06936936932 06936936932 40000000000126 19189189177 06936936941 06936936941 06936936941 40000000000127 19189189199 06936936934 06936936934 06936936934 40000000000128 19189189181 06936936940 06936936940 06936936940 40000000000129 19189189196 06936936935 06936936935 06936936935 40000000000130 19189189183 06936936939 06936936939 06936936939 40000000000131 19189189194 06936936935 06936936935 06936936935 40000000000132 19189189185 06936936938 06936936938 06936936938 40000000000133 19189189193 06936936936 06936936936 06936936936 40000000000134 19189189186 06936936938 06936936938 06936936938 40000000000135 19189189192 06936936936 06936936936 06936936936 40000000000136 19189189187 06936936938 06936936938 06936936938 40000000000137 19189189191 06936936936 06936936936 06936936936 40000000000138 19189189188 06936936937 06936936937 06936936937 40000000000139 19189189191 06936936936 06936936936 06936936936 40000000000140 19189189188 06936936937 06936936937 06936936937 40000000000141 19189189190 06936936937 06936936937 06936936937 40000000000142 19189189188 06936936937 06936936937 06936936937 40000000000143 19189189190 06936936937 06936936937 06936936937 40000000000144 19189189189 06936936937 06936936937 06936936937 40000000000145 19189189190 06936936937 06936936937 06936936937 40000000000146 19189189189 06936936937 06936936937 06936936937 40000000000147 19189189190 06936936937 06936936937 06936936937 40000000000148 19189189189 06936936937 06936936937 06936936937 40000000000

[Excel Example 17]

As you can see it takes 148 iterations Now if we recalculate the normalizationco-efficient at each iteration we get the following

A B C D SumStart 10000000000 10000000000 10000000000 10000000000 40000000000

1 27000000000 04333333333 04333333333 04333333333 400000000002 15100000000 08300000000 08300000000 08300000000 400000000003 23430000000 05523333333 05523333333 05523333333 400000000004 17599000000 07467000000 07467000000 07467000000 400000000005 21680700000 06106433333 06106433333 06106433333 400000000006 18823510000 07058830000 07058830000 07058830000 400000000007 20823543000 06392152333 06392152333 06392152333 400000000008 19423519900 06858826700 06858826700 06858826700 400000000009 20403536070 06532154643 06532154643 06532154643 40000000000

10 19717524751 06760825083 06760825083 06760825083 4000000000011 20197732674 06600755775 06600755775 06600755775 4000000000012 19861587128 06712804291 06712804291 06712804291 40000000000

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 42: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

41

13 20096889010 06634370330 06634370330 06634370330 4000000000014 19932177693 06689274102 06689274102 06689274102 4000000000015 20047475615 06650841462 06650841462 06650841462 4000000000016 19966767069 06677744310 06677744310 06677744310 4000000000017 20023263051 06658912316 06658912316 06658912316 4000000000018 19983715864 06672094712 06672094712 06672094712 4000000000019 20011398895 06662867035 06662867035 06662867035 4000000000020 19992020773 06669326409 06669326409 06669326409 4000000000021 20005585459 06664804847 06664804847 06664804847 4000000000022 19996090179 06667969940 06667969940 06667969940 4000000000023 20002736875 06665754375 06665754375 06665754375 4000000000024 19998084188 06667305271 06667305271 06667305271 4000000000025 20001341069 06666219644 06666219644 06666219644 4000000000026 19999061252 06666979583 06666979583 06666979583 4000000000027 20000657124 06666447625 06666447625 06666447625 4000000000028 19999540013 06666819996 06666819996 06666819996 4000000000029 20000321991 06666559336 06666559336 06666559336 4000000000030 19999774607 06666741798 06666741798 06666741798 4000000000031 20000157775 06666614075 06666614075 06666614075 4000000000032 19999889557 06666703481 06666703481 06666703481 4000000000033 20000077310 06666640897 06666640897 06666640897 4000000000034 19999945883 06666684706 06666684706 06666684706 4000000000035 20000037882 06666654039 06666654039 06666654039 4000000000036 19999973483 06666675506 06666675506 06666675506 4000000000037 20000018562 06666660479 06666660479 06666660479 4000000000038 19999987007 06666670998 06666670998 06666670998 4000000000039 20000009095 06666663635 06666663635 06666663635 4000000000040 19999993633 06666668789 06666668789 06666668789 4000000000041 20000004457 06666665181 06666665181 06666665181 4000000000042 19999996880 06666667707 06666667707 06666667707 4000000000043 20000002184 06666665939 06666665939 06666665939 4000000000044 19999998471 06666667176 06666667176 06666667176 4000000000045 20000001070 06666666310 06666666310 06666666310 4000000000046 19999999251 06666666916 06666666916 06666666916 4000000000047 20000000524 06666666492 06666666492 06666666492 4000000000048 19999999633 06666666789 06666666789 06666666789 4000000000049 20000000257 06666666581 06666666581 06666666581 4000000000050 19999999820 06666666727 06666666727 06666666727 4000000000051 20000000126 06666666625 06666666625 06666666625 4000000000052 19999999912 06666666696 06666666696 06666666696 4000000000053 20000000062 06666666646 06666666646 06666666646 4000000000054 19999999957 06666666681 06666666681 06666666681 4000000000055 20000000030 06666666657 06666666657 06666666657 4000000000056 19999999979 06666666674 06666666674 06666666674 4000000000057 20000000015 06666666662 06666666662 06666666662 4000000000058 19999999990 06666666670 06666666670 06666666670 4000000000059 20000000007 06666666664 06666666664 06666666664 4000000000060 19999999995 06666666668 06666666668 06666666668 4000000000061 20000000004 06666666665 06666666665 06666666665 4000000000062 19999999998 06666666667 06666666667 06666666667 4000000000063 20000000002 06666666666 06666666666 06666666666 40000000000

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 43: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

42

64 19999999999 06666666667 06666666667 06666666667 4000000000065 20000000001 06666666666 06666666666 06666666666 4000000000066 19999999999 06666666667 06666666667 06666666667 4000000000067 20000000000 06666666667 06666666667 06666666667 40000000000

[Excel Example 18]

A very good approximation of the values with only 67 calculations This is knownas the ldquoaccelerated technology of the computation of the Page Rank estimatingvaluesrdquo So by being a little bit clever Google can considerably reduce thecomputation time needed to calculate PageRank

How PageRank could be more than it seems

Therersquos a possibility just a possibility that PageRank could well be more than itseems If we think about the factors that could affect a sitersquos ranking in Googlewe come up with the obvious things such as PageRank keywords anchor textand so on But where do the less obvious factors like ldquoquantity of textrdquo fit Whatif for example some pages were preferred because they had the correctquantity of text This doesnrsquot seem to fit into the traditional factors but it can beworked into PageRank Letrsquos explain

How we distribute the normalization co-efficients is pretty much up to us as longas the total value stays the same So with our previous example we may set theldquoAbout Usrdquo page (Page B) higher and lower the rest of the pages because pageB meets some criteria we define (eg it has the correct amount of text) So wecould set page B to

(04n + 06)015

as long as we set the other pages to

06015

ie [01n + 09+09 (n-1)=n where n is the number of Pages in the Site]

This strongly favours Page B Letrsquos give an example Normally the final valuesfor the previous example site structure would be

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 44: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

43

Using our new values for the co-efficients in an attempt to show favour to Page B(that meets our arbitrary criteria) we get

[Excel Example 19]

As you can see wersquove boosted the PageRank of the About Us page at theexpense of all other pages ndash simply because wersquove introduced a new arbitraryfactor called amount of text

PageRank not only considers links but it may also consider certain other factors

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 45: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

44

There is of course no reason why we should stop at boosting a single pagersquosPageRank ndash it is just as easy to boost an entire sitersquos PageRank based on anyfactor we like Examples of possible factors are ldquoitrsquos popularrdquo or ldquowe like itrdquo

Favouring pages in directories

It stands to reason that if we can cause an increase in the PageRank of a pageor pages at the expense of those it links to and from then we can also cause adecrease in its PageRank to the benefit of those it links to and from Why wouldwe want to do this Because the page represents a list of links that points tohigh-quality pages Letrsquos pretend that Page B is such a list say a category pagewith links in ODP

Wersquoll set Page Brsquos normalization co-efficient to 006 and the rest to 018 We endup with

[Excel Example 20]

And sure enough it does exactly what we want it to do

This has several advantages The page itself which is actually just a list of linkswill be pushed down in the rankings Whilst the pages it links to will get a boostThus the searcher is less likely to see pages that are just links and more likely tosee good quality pages In addition those with the Google Toolbar installed wonrsquotbe racing each other to get links on such pages because their PageRank willappear deceptively low

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 46: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

45

Examples

Throughout this document yoursquove seen the phrase [Excel Example x] This refersto the actual excel spreadsheet that contains the calculations used to producethis example (so you can look at it in more depth if you want to)

You can get the examples from httpwwwsupportforumsorgpagerank

Mathematical aspects of PageRank for advanced readers

The theoretical concept of PageRank interprets Web sites as directedmultigraphs with pages on their tops Googlebot (Googlersquos spider) explores asite traces links between pages and forms the matrix of links which contains allinformation about the topology of that site

Letrsquos write down the elements of the matrix of links Dkm N n) (m=1n)

What does that mean Let us explain the designations

T1 T2hellipTn correspond to Web site pages (n is the number of pages)

Components akm which are on the intersection of the k-line and m-columnin the matrix of links determine the number of links off Tk to Tm

Cr (Tk) is the total number of links off Tk

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 47: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

46

DIAGRAM OF THE MATRIX OF LINKS (1)

T1 T2 Tn Num

ber o

f lin

ks o

ffth

e pa

ge

T1 a 11 a 12 hellip hellip a 1n Cr(T1)

T2 a 21 a 22 hellip hellip a 2n Cr(T2)

Tn a n1 a n2 hellip hellip a nn Cr(Tn)

Letrsquos write down the equation for the computation scheme

n

PR I+1 (Tm) = (1-d) + d Σ akmPR I (Tk) Cr(Tk) k=1Where

PRI+1 (Tm) is Page Rank of the Page Tm on (I+1) iteration (the estimating value ofthe page Tm (m=1hellip n) )

PR I (Tk) is the Page Rank of the Page Tk on the I iteration (the estimating valueof the Page Tk (k=1hellip n) )

Cr(Tk) is the total number of links off the page Tk

Cm(Tk) = akm is the number of the individual links off the page Tk to the page Tm(k=1n) (m=1 n)

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 48: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

47

d is a dampening factor (usually 085)

(1-d) is a normalization coefficient

n is the number of the Pages on the Site

This equation of a recursive computational scheme allows taking intoconsideration and then computing the estimating values of a Web site payingattention to the probable complexity of links between pages

Take a look at the following example Generally it is a multigraph of a small Website Letrsquos apply PageRank methods to this site

MULTIGRAPH OF A SIMPLE WEB SITE

You may have already noticed that every page of the second subgraph links toevery page of the first subgraph

Letrsquos construct a matrix of links of this site The values A B C D E F G H Irefer to the corresponding pages

A

D

CB

H

IGFE

1 2

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 49: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

48

DIAGRAM OF THE MATRIX OF LINKS (2)

A B C D E F G H I Num

ber o

f lin

ks o

ff th

e pa

ge

A 0 1 0 0 0 0 0 0 0 Cr(A) 1

Homeindex B 1 0 1 1 0 0 0 0 0 Cr(B) 3

C 0 1 0 0 0 0 0 0 0 Cr(C) 1

D 0 1 0 0 0 0 0 0 0 Cr(D) 1

E 1 1 1 1 0 1 0 0 0 Cr(E) 5

F 1 1 1 1 0 0 1 0 0 Cr(F 5

G 1 1 1 1 0 1 0 1 1 Cr(G) 7

H 1 1 1 1 1 0 1 0 0 Cr(H) 6

I 1 1 1 1 0 0 1 0 0 Cr(I) 5

Note the wonderful ability of PageRank to distinguish the most important pagesand the relationship of external pages to them Later we will use the term ldquobasepagesrdquo to describe those important pages The main feature of the base pages isthat they are traversed by a directed cycle

Letrsquos apply the so-called accelerated technology of computation of the estimatingvalues of the site pages

Suppose PR i (T) is the estimating value for the page T (T=A B C D E F GH I) page on i-iterationThe normalization coefficient for the PR i+1(T) computation (on (i + 1)-iteration)can be calculated by formula (1-d) PR i (T)

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 50: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

49

Applying the accelerated technology the computation of the estimating values ofthe site pages the limiting values of the estimating values of external pages inrelation to the base sites are equal to zero

This approach permits us to distinguish the external pages in relation to the basepages Thus we can introduce the following definition

The external pages in relation to the base pages are ones that have zero limitingvalues on application of the accelerated technology of the computation of theestimating values of the site pages

ACCELERATED COMPUTATION OF PAGERANK for recognition of externallinks

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1450932 3883281 1450932 1450932 0082202 0192500 0254388 0117417 0117417 9000

3 1432087 4396552 1432087 1432087 0028964 0073739 0107478 0048502 0048502 9000

4 1506130 4356932 1506130 1506130 0011216 0029036 0043774 0020326 0020326 9000

5 1478877 4512665 1478877 1478877 0004562 0011577 0017837 0008364 0008364 9000

6 1507936 4455552 1507936 1507936 0001869 0004678 0007251 0003421 0003421 9000

7 1491656 4516630 1491656 1491656 0000765 0001900 0002949 0001394 0001394 9000

8 1504706 4482464 1504706 1504706 0000312 0000773 0001200 0000567 0000567 9000

9 1496244 4509876 1496244 1496244 0000127 0000315 0000488 0000231 0000231 9000

10 1502441 4492110 1502441 1502441 0000052 0000128 0000199 0000094 0000094 9000

11 1498215 4505125 1498215 1498215 0000021 0000052 0000081 0000038 0000038 9000

12 1501219 4496251 1501219 1501219 0000009 0000021 0000033 0000016 0000016 9000

13 1499134 4502559 1499134 1499134 0000003 0000009 0000013 0000006 0000006 9000

14 1500601 4498182 1500601 1500601 0000001 0000004 0000005 0000003 0000003 9000

15 1499577 4501262 1499577 1499577 0000001 0000001 0000002 0000001 0000001 9000

16 1500295 4499112 1500295 1500295 0000000 0000001 0000001 0000000 0000000 9000

17 1499793 4500620 1499793 1499793 0000000 0000000 0000000 0000000 0000000 9000

18 1500145 4499566 1500145 1500145 0000000 0000000 0000000 0000000 0000000 9000

19 1499899 4500304 1499899 1499899 0000000 0000000 0000000 0000000 0000000 9000

20 1500071 4499787 1500071 1500071 0000000 0000000 0000000 0000000 0000000 9000

21 1499950 4500149 1499950 1499950 0000000 0000000 0000000 0000000 0000000 9000

22 1500035 4499896 1500035 1500035 0000000 0000000 0000000 0000000 0000000 9000

23 1499976 4500073 1499976 1499976 0000000 0000000 0000000 0000000 0000000 9000

24 1500017 4499949 1500017 1500017 0000000 0000000 0000000 0000000 0000000 9000

25 1499988 4500036 1499988 1499988 0000000 0000000 0000000 0000000 0000000 9000

26 1500008 4499975 1500008 1500008 0000000 0000000 0000000 0000000 0000000 9000

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 51: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

50

27 1499994 4500018 1499994 1499994 0000000 0000000 0000000 0000000 0000000 9000

28 1500004 4499988 1500004 1500004 0000000 0000000 0000000 0000000 0000000 9000

29 1499997 4500009 1499997 1499997 0000000 0000000 0000000 0000000 0000000 9000

30 1500002 4499994 1500002 1500002 0000000 0000000 0000000 0000000 0000000 9000

31 1499999 4500004 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

32 1500001 4499997 1500001 1500001 0000000 0000000 0000000 0000000 0000000 9000

33 1499999 4500002 1499999 1499999 0000000 0000000 0000000 0000000 0000000 9000

34 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

35 1500000 4500001 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

36 1500000 4499999 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

37 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

38 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

39 1500000 4500000 1500000 1500000 0000000 0000000 0000000 0000000 0000000 9000

For the Web site that we have just examined the pages E F G I H are externalin relation to the base pages A B C D

Letrsquos calculate the estimating values for the pages of the Web site examined withthe constant normalization coefficient (1-d) =015 d = 085

CLASSIC COMPUTATION OF PAGERANK

A B C D E F G H I Amount

Itera

tion

1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 1000000 9000

1 1206429 3473095 1206429 1206429 0291667 0441429 0631667 0271429 0271429 9000

2 1419967 3512317 1419967 1419967 0188452 0276286 0309638 0226702 0226702 9000

3 1332416 3958177 1332416 1332416 0182116 0219636 0267624 0187599 0187599 9000

4 1430747 3706925 1430747 1430747 0176577 0213457 0245806 0182497 0182497 9000

5 1353327 3951436 1353327 1353327 0175854 0209866 0243166 0179848 0179848 9000

6 1420726 3752137 1420726 1420726 0175478 0209422 0241730 0179527 0179527 9000

7 1363844 3923590 1363844 1363844 0175433 0209184 0241554 0179353 0179353 9000

8 1412299 3778418 1412299 1412299 0175408 0209155 0241460 0179332 0179332 9000

9 1371139 3901949 1371139 1371139 0175405 0209140 0241448 0179320 0179320 9000

10 1406132 3796985 1406132 1406132 0175404 0209138 0241442 0179319 0179319 9000

11 1376390 3886213 1376390 1376390 0175403 0209137 0241441 0179318 0179318 9000

12 1401671 3810371 1401671 1401671 0175403 0209136 0241441 0179318 0179318 9000

13 1380182 3874838 1380182 1380182 0175403 0209136 0241441 0179318 0179318 9000

14 1398448 3820041 1398448 1398448 0175403 0209136 0241441 0179318 0179318 9000

15 1382922 3866618 1382922 1382922 0175403 0209136 0241441 0179318 0179318 9000

16 1396119 3827028 1396119 1396119 0175403 0209136 0241441 0179318 0179318 9000

17 1384901 3860680 1384901 1384901 0175403 0209136 0241441 0179318 0179318 9000

18 1394436 3832076 1394436 1394436 0175403 0209136 0241441 0179318 0179318 9000

19 1386332 3856389 1386332 1386332 0175403 0209136 0241441 0179318 0179318 9000

20 1393220 3835723 1393220 1393220 0175403 0209136 0241441 0179318 0179318 9000

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 52: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

51

21 1387365 3853289 1387365 1387365 0175403 0209136 0241441 0179318 0179318 9000

22 1392342 3838358 1392342 1392342 0175403 0209136 0241441 0179318 0179318 9000

23 1388112 3851049 1388112 1388112 0175403 0209136 0241441 0179318 0179318 9000

24 1391708 3840261 1391708 1391708 0175403 0209136 0241441 0179318 0179318 9000

25 1388651 3849431 1388651 1388651 0175403 0209136 0241441 0179318 0179318 9000

26 1391249 3841637 1391249 1391249 0175403 0209136 0241441 0179318 0179318 9000

27 1389041 3848262 1389041 1389041 0175403 0209136 0241441 0179318 0179318 9000

28 1390918 3842631 1390918 1390918 0175403 0209136 0241441 0179318 0179318 9000

29 1389322 3847417 1389322 1389322 0175403 0209136 0241441 0179318 0179318 9000

30 1390678 3843349 1390678 1390678 0175403 0209136 0241441 0179318 0179318 9000

31 1389526 3846807 1389526 1389526 0175403 0209136 0241441 0179318 0179318 9000

32 1390506 3843867 1390506 1390506 0175403 0209136 0241441 0179318 0179318 9000

33 1389673 3846366 1389673 1389673 0175403 0209136 0241441 0179318 0179318 9000

34 1390381 3844242 1390381 1390381 0175403 0209136 0241441 0179318 0179318 9000

35 1389779 3846048 1389779 1389779 0175403 0209136 0241441 0179318 0179318 9000

36 1390290 3844513 1390290 1390290 0175403 0209136 0241441 0179318 0179318 9000

37 1389856 3845817 1389856 1389856 0175403 0209136 0241441 0179318 0179318 9000

38 1390225 3844709 1390225 1390225 0175403 0209136 0241441 0179318 0179318 9000

39 1389911 3845651 1389911 1389911 0175403 0209136 0241441 0179318 0179318 9000

40 1390178 3844850 1390178 1390178 0175403 0209136 0241441 0179318 0179318 9000

41 1389951 3845531 1389951 1389951 0175403 0209136 0241441 0179318 0179318 9000

42 1390144 3844952 1390144 1390144 0175403 0209136 0241441 0179318 0179318 9000

43 1389980 3845444 1389980 1389980 0175403 0209136 0241441 0179318 0179318 9000

44 1390119 3845026 1390119 1390119 0175403 0209136 0241441 0179318 0179318 9000

45 1390001 3845381 1390001 1390001 0175403 0209136 0241441 0179318 0179318 9000

46 1390102 3845079 1390102 1390102 0175403 0209136 0241441 0179318 0179318 9000

47 1390016 3845336 1390016 1390016 0175403 0209136 0241441 0179318 0179318 9000

48 1390089 3845118 1390089 1390089 0175403 0209136 0241441 0179318 0179318 9000

49 1390027 3845303 1390027 1390027 0175403 0209136 0241441 0179318 0179318 9000

50 1390080 3845146 1390080 1390080 0175403 0209136 0241441 0179318 0179318 9000

51 1390035 3845280 1390035 1390035 0175403 0209136 0241441 0179318 0179318 9000

52 1390073 3845166 1390073 1390073 0175403 0209136 0241441 0179318 0179318 9000

53 1390041 3845263 1390041 1390041 0175403 0209136 0241441 0179318 0179318 9000

54 1390068 3845180 1390068 1390068 0175403 0209136 0241441 0179318 0179318 9000

55 1390045 3845250 1390045 1390045 0175403 0209136 0241441 0179318 0179318 9000

56 1390064 3845191 1390064 1390064 0175403 0209136 0241441 0179318 0179318 9000

57 1390048 3845241 1390048 1390048 0175403 0209136 0241441 0179318 0179318 9000

58 1390062 3845198 1390062 1390062 0175403 0209136 0241441 0179318 0179318 9000

59 1390050 3845235 1390050 1390050 0175403 0209136 0241441 0179318 0179318 9000

60 1390060 3845204 1390060 1390060 0175403 0209136 0241441 0179318 0179318 9000

61 1390051 3845230 1390051 1390051 0175403 0209136 0241441 0179318 0179318 9000

62 1390059 3845208 1390059 1390059 0175403 0209136 0241441 0179318 0179318 9000

63 1390052 3845227 1390052 1390052 0175403 0209136 0241441 0179318 0179318 9000

64 1390058 3845211 1390058 1390058 0175403 0209136 0241441 0179318 0179318 9000

65 1390053 3845224 1390053 1390053 0175403 0209136 0241441 0179318 0179318 9000

66 1390057 3845213 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

67 1390054 3845223 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

68 1390057 3845214 1390057 1390057 0175403 0209136 0241441 0179318 0179318 9000

69 1390054 3845221 1390054 1390054 0175403 0209136 0241441 0179318 0179318 9000

70 1390056 3845215 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 53: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

52

71 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

72 1390056 3845216 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

73 1390055 3845220 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

74 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

75 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

76 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

77 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

78 1390056 3845217 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

79 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

80 1390056 3845218 1390056 1390056 0175403 0209136 0241441 0179318 0179318 9000

81 1390055 3845219 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

82 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

83 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

84 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

85 1390055 3845218 1390055 1390055 0175403 0209136 0241441 0179318 0179318 9000

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

Amount

P 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of Page Rank for the pages of the site examined (Logarithmsare to be calculated on base 2)

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 54: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

53

Amount

PR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

TheToolbarShows

1 1 1 1 1 1 1 1 1 1

The limiting estimating values for the site pages are darkened

Having divided the estimating values to their total amount which isunchangeable upon execution of the iteration process wersquoll receive the set ofprobability values (the amount of all probability values is equal to one)

AmountP 0154451 0427246 0154451 0154451 0019489 0023237 0026827 0019924 00199241000

Running a parallel demonstration wersquod like to use the famous classicalShannonrsquos formula which was normally used for calculation of the volume of theinformation which an experiment contains and which has n outcomes with theset probabilities

Experiment outcome A1 A2 hellip An

Probabilities p1 p2 hellip pn

H(p1 p2hellip pn) = - p(A1)log p(A1) - p(A2)log p(A2) -hellip- p(An)log p(An)

Letrsquos calculate the values for a set of probability values which correspond to thesite pages and wersquoll receive the following set of the values

This set of values of PageRank for the pages of the site examined (Logarithmsare to be calculated on base 2)

AmountPR 0416211 0524171 0416211 0416211 0110722 0126119 0140040 0112558 0112558 2375

The ToolbarShows 1 1 1 1 1 1 1 1 1 1

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 55: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

54

PageRank is a successful representation of the directed multigraph It is widelyknown that multigraphs are used to describe and build self-learning systemswhich model the behavior of animals

Letrsquos assume that every time an incentive is engaged an animal must make achoice eg to turn to left or right So every time an animal behaves as we want itto we can encourage it by applying operator Z+

P n+1 = Z+ (P n) = d P n + (1 ndash d) 0 lt d lt 1 (P ndash the estimating value)

The theory for this subject can be found at the classic work of Bush and Mosteller-- Stochastic Models for Learning Wiley New York 1955

We suppose that it was a variation of Shannonrsquos formula that led to birth ofPageRank

Further Reading and Resources

For those interested in doing further PageRank research we would recommendvisiting the following resources

PageRank Question and Answer Areahttpwwwsupportforumsorgpagerankqampa

Search Engine Marketing ndash the essential best practice guide (an in depth view ofthe other aspects of how search engine work)httpwwwsupportforumsorgpageranksem

PageRank Bringing order to the webhttphcistanfordedu~pagepaperspagerankppframehtm

The anatomy of a Large-Scale HyperTextual Web Search Enginehttpwww-dbstanfordedu~backrubgooglehtml

The first PageRank calculator (with the exception of Googlersquos) by Mark Horrellhttpwwwmarkhorrellcomseopagerankasp

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide

Page 56: “PageRank Uncovered” › PageRank.pdf · PageRank is linear, Google has chosen to use a non-linear graph to portray it. So on the toolbar, to move from a PageRank of 2 to a PageRank

55

Technical Proof Readers credits

The authors of this document welcome comments from experts in the fields ofsearch engines mathematics and search engine optimization We feel thisfeedback is important and provide the following comments and links as a thanksfor the feedback of the people mentioned If yoursquod like to make a comment onthis document then please email chrissupportforumsorg

In the field of information retrieval on the web PageRank has emerged as the primary(and most widely discussed) hyperlink analysis algorithm But how it works is stilllargely a mystery to many in the SEO online community PageRank Uncovered by ChrisRidings and Mike Shishigin is an illuminating document which unveils the mystery andprovides the most detailed in-depth yet easy to follow explanation of the power behindGoogle No matter what level yoursquore at in the field of SEO this is not something youshould read itrsquos something you must read

Mike GrehanAuthorSearch Engine Marketing The essential best practice guide