BigData - PageRank Algorithm with Scala and Spark

download BigData - PageRank Algorithm with Scala and Spark

of 12

  • date post

    14-Jan-2017
  • Category

    Technology

  • view

    160
  • download

    2

Embed Size (px)

Transcript of BigData - PageRank Algorithm with Scala and Spark

  • PageRank - Spark/ScalaYubraj Pokharel

  • PageRank Algorithm Implementation in Spark

  • What is PageRank?

    PageRank of a web page is a number given to the page which represents the relative importance of that page in comparison to all other web pages.

    A web page contains inbound and outbound links.

    A page which has more inbound links is considered more important.

  • How to calculate it?PR(A) = (1-d) + d * (PR(T1) / C(T1) + ... + PR(Tn) / C(Tn))

    PR(A) => pagerank of a web page

    d => damping factor

    PR(Tn) => page ranks of web pages inbound to the web page whose page rank we are calculating

    C(Tn) => number of outbound links in the web page specified by PR(Tn)

  • How it is calculated?

    Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

    Initial Page Ranks

    PR(A) = 1.0PR(B) = 1.0PR(C) = 1.0PR(D) = 1.0PR(E) = 1.0

  • 1st iteration

    Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

    PR(A) = 0.15 + 0.85*( + ) = 0.8583333333333333

    PR(B) = 0.15 + 0.85 * (1/1 + 1/1) = 1.85

    PR(C) = 0.15 + 0.85 * ( + ) = 1.0PR(D) = 0.433333333333PR(E) = 0.858333333333

  • 2nd iteration

    Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

    PR(A) = 0.15 + 0.85*(1 / 3 + 0.433333333333 / 2) = 0.6175

    PR(B) = 1.60916666666666PR(C) = 1.12041666666666PR(D) = 0.43333333333333PR(E) = 1.21958333333333

  • 30th iteration

    Links(A, B)(B, C)(B, E)(C, A)(C, D)(C, E)(D, A)(D, C)(E, B)

    (B, 1.685860900896)(E, 1.1661421381814026)(C, 1.0575926664315842)(A, 0.6407530390774026)(D, 0.4496512554136104)

    B is the most ranked page

  • Spark/Scala Code

  • References1. http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm2. http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html3. http://www.ams.org/samplings/feature-column/fcarc-pagerank4. http://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdf5. http://www.cse.cuhk.edu.hk/~cslui/CMSC5702/mapreduce_hadoop2.pdf

    http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.htmlhttp://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.htmlhttp://www.ams.org/samplings/feature-column/fcarc-pagerankhttp://www.ams.org/samplings/feature-column/fcarc-pagerankhttp://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdfhttp://www.umiacs.umd.edu/~jbg/teaching/INFM_718_2011/lecture_3.pdf

  • Questions??

  • Thank you :) -happy coding