     • date post

14-Jan-2017
• Category

## Technology

• view

160

2

Embed Size (px)

### Transcript of BigData - PageRank Algorithm with Scala and Spark

• PageRank - Spark/ScalaYubraj Pokharel

• PageRank Algorithm Implementation in Spark

• What is PageRank?

PageRank of a web page is a number given to the page which represents the relative importance of that page in comparison to all other web pages.

A web page contains inbound and outbound links.

A page which has more inbound links is considered more important.

• How to calculate it?PR(A) = (1-d) + d * (PR(T1) / C(T1) + ... + PR(Tn) / C(Tn))

PR(A) => pagerank of a web page

d => damping factor

PR(Tn) => page ranks of web pages inbound to the web page whose page rank we are calculating

C(Tn) => number of outbound links in the web page specified by PR(Tn)

• How it is calculated?

Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

Initial Page Ranks

PR(A) = 1.0PR(B) = 1.0PR(C) = 1.0PR(D) = 1.0PR(E) = 1.0

• 1st iteration

Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

PR(A) = 0.15 + 0.85*( + ) = 0.8583333333333333

PR(B) = 0.15 + 0.85 * (1/1 + 1/1) = 1.85

PR(C) = 0.15 + 0.85 * ( + ) = 1.0PR(D) = 0.433333333333PR(E) = 0.858333333333

• 2nd iteration

Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

PR(A) = 0.15 + 0.85*(1 / 3 + 0.433333333333 / 2) = 0.6175

PR(B) = 1.60916666666666PR(C) = 1.12041666666666PR(D) = 0.43333333333333PR(E) = 1.21958333333333

• 30th iteration

Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

(B, 1.685860900896)(E, 1.1661421381814026)(C, 1.0575926664315842)(A, 0.6407530390774026)(D, 0.4496512554136104)

B is the most ranked page

• Spark/Scala Code

• References1. http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm2. http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html3. http://www.ams.org/samplings/feature-column/fcarc-pagerank4. http://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdf5. http://www.cse.cuhk.edu.hk/~cslui/CMSC5702/mapreduce_hadoop2.pdf

http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.htmlhttp://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.htmlhttp://www.ams.org/samplings/feature-column/fcarc-pagerankhttp://www.ams.org/samplings/feature-column/fcarc-pagerankhttp://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdfhttp://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdf

• Questions??

• Thank you :) -happy coding