The PageRank Citation Ranking: Bringing Order to the Web

24
The PageRank Citation Ranking: The PageRank Citation Ranking: Bringing Order to the Web Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun

description

The PageRank Citation Ranking: Bringing Order to the Web. Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun. Contents. Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion. - PowerPoint PPT Presentation

Transcript of The PageRank Citation Ranking: Bringing Order to the Web

Page 1: The PageRank Citation Ranking: Bringing Order to the Web

The PageRank Citation Ranking:The PageRank Citation Ranking:Bringing Order to the WebBringing Order to the Web

Larry Page etc.Stanford University, Technical Report 1998

Presented by:Ratiya Komalarachun

Page 2: The PageRank Citation Ranking: Bringing Order to the Web

2

ContentsContents

Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion

Page 3: The PageRank Citation Ranking: Bringing Order to the Web

3

MotivationMotivation

Web: heterogeneous and unstructured

Free of quality control on the web

Commercial interest to manipulate ranking

Page 4: The PageRank Citation Ranking: Bringing Order to the Web

4

Related WorkRelated Work Academic citation analysis

Link based analysis

Clustering methods of link structure

Hubs & Authorities Model based on an eigenvector calculation

Page 5: The PageRank Citation Ranking: Bringing Order to the Web

5

hubs

Hubs & Authorities ModelHubs & Authorities Model

authorities

Page 6: The PageRank Citation Ranking: Bringing Order to the Web

6

Hubs & Authorities ModelHubs & Authorities Model

Mutually reinforcing relationship

“A good hub is a page that points to many good authorities”

“A good authority is a page that is pointed by many good hub”

Page 7: The PageRank Citation Ranking: Bringing Order to the Web

7

Link Structure of the WebLink Structure of the Web Forward links (outedges) Backlinks (inedges) Approximation of importance /

quality

Page 8: The PageRank Citation Ranking: Bringing Order to the Web

8

PageRankPageRank A page has high rank if the sum of

the ranks of its backlinks is high

Backlinks coming from important pages convey more importance to a page

Problem: Dangling Links, Rank Sink

Page 9: The PageRank Citation Ranking: Bringing Order to the Web

9

Dangling LinksDangling Links

Page 10: The PageRank Citation Ranking: Bringing Order to the Web

10

PageRank CalculationPageRank Calculation

uBv vN

vRcuR

)()(

Given: R(u) = Rank of u, R(v) = Rank of v,

c < 1 (used for normalization) Nv = number of link from v

Bu = the set of pages that point to u

Page 11: The PageRank Citation Ranking: Bringing Order to the Web

11

PageRank CalculationPageRank Calculation

100 50

50

9

3

3

3

53

50

Page 12: The PageRank Citation Ranking: Bringing Order to the Web

12

Page cycles pointed by some incoming link

Problem: Ranking increase, don’t effect any rank outside

Rank SinkRank Sink

.6

.6

.6

.6

Page 13: The PageRank Citation Ranking: Bringing Order to the Web

13

Escape TermEscape Term Solution: Rank Source

E(u) is some vector over the web pages– uniform, favorite page etc.

)()(

)( ucEN

vRcuR

uBv v

Page 14: The PageRank Citation Ranking: Bringing Order to the Web

14

R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized

Matrix NotationMatrix Notation

ReEAcR TT )(

Page 15: The PageRank Citation Ranking: Bringing Order to the Web

15

Computing PageRankComputing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks

- compute normalizing factor

- add escape term

- control parameter

While - stop when converged

SR 0

iT

i RAR 1

111 ii RRd

dERR ii 11

ii RR 1

Page 16: The PageRank Citation Ranking: Bringing Order to the Web

16

Page Rank vs. Random Surfer Model

E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever”

Random Surfer ModelRandom Surfer Model

Page 17: The PageRank Citation Ranking: Bringing Order to the Web

17

ImplementationImplementation Computing resources — 24 million pages — 75 million URLs

— Process 550 pages/sec Memory and disk storage

Weight Vector (4 byte float)

Matrix A (linear access)

Page 18: The PageRank Citation Ranking: Bringing Order to the Web

18

ImplementationImplementation

Assign a unique integer ID Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re-

compute

Page 19: The PageRank Citation Ranking: Bringing Order to the Web

19

Convergence PropertiesConvergence Properties

Using theory of random walks on graphs

O(log(|V|)) due to rapidly mixing graph G of the web.

Page 20: The PageRank Citation Ranking: Bringing Order to the Web

20

Convergence PropertiesConvergence Properties

Page 21: The PageRank Citation Ranking: Bringing Order to the Web

21

Searching with PageRankSearching with PageRank

Using title search

Comparing with Altavista

Page 22: The PageRank Citation Ranking: Bringing Order to the Web

22

Sample ResultsSample Results

Page 23: The PageRank Citation Ranking: Bringing Order to the Web

23

Some Applications Some Applications

Estimate web traffic

Backlink predictor

User Navigation

Page 24: The PageRank Citation Ranking: Bringing Order to the Web

24

ConclusionConclusion PageRank is a global ranking based

on the web's graph structure PageRank uses backlinks

information to bring order to the web

PageRank can separate out representative pages as cluster center

A great variety of applications