BigData - PageRank Algorithm with Scala and Spark
-
Upload
yubraj-pokharel -
Category
Technology
-
view
199 -
download
2
Transcript of BigData - PageRank Algorithm with Scala and Spark
PageRank - Spark/ScalaYubraj Pokharel
PageRank Algorithm Implementation in Spark
What is PageRank?
PageRank of a web page is a number given to the page which represents the relative importance of that page in comparison to all other web pages.
A web page contains inbound and outbound links.
A page which has more inbound links is considered more important.
How to calculate it?PR(A) = (1-d) + d * (PR(T1) / C(T1) + ... + PR(Tn) / C(Tn))
PR(A) => pagerank of a web page
d => damping factor
PR(Tn) => page ranks of web pages inbound to the web page whose page rank we are calculating
C(Tn) => number of outbound links in the web page specified by PR(Tn)
How it is calculated?
Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)
Initial Page Ranks
PR(A) = 1.0PR(B) = 1.0PR(C) = 1.0PR(D) = 1.0PR(E) = 1.0
1st iteration
Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)
PR(A) = 0.15 + 0.85*(⅓ + ½ ) = 0.8583333333333333
PR(B) = 0.15 + 0.85 * (1/1 + 1/1) = 1.85
PR(C) = 0.15 + 0.85 * (½ + ½) = 1.0PR(D) = 0.433333333333PR(E) = 0.858333333333
2nd iteration
Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)
PR(A) = 0.15 + 0.85*(1 / 3 + 0.433333333333 / 2) = 0.6175
PR(B) = 1.60916666666666PR(C) = 1.12041666666666PR(D) = 0.43333333333333PR(E) = 1.21958333333333
30th iteration
Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)
(B, 1.685860900896)(E, 1.1661421381814026)(C, 1.0575926664315842)(A, 0.6407530390774026)(D, 0.4496512554136104)
B is the most ranked page
Spark/Scala Code
References1. http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm2. http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html3. http://www.ams.org/samplings/feature-column/fcarc-pagerank4. http://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdf5. http://www.cse.cuhk.edu.hk/~cslui/CMSC5702/mapreduce_hadoop2.pdf
Questions??
Thank you :) -happy coding