Introduction into scalable graph analysis with Apache Giraph and Spark GraphX

download Introduction into scalable graph analysis with Apache Giraph and Spark GraphX

of 48

  • date post

    18-Aug-2015
  • Category

    Software

  • view

    202
  • download

    4

Embed Size (px)

Transcript of Introduction into scalable graph analysis with Apache Giraph and Spark GraphX

  1. 1. Scalable graph analysis with Apache Giraph and Spark GraphX Roman Shaposhnik rvs@apache.org @rhatr Director of Open Source, Pivotal Inc.
  2. 2. Introduction into scalable graph analysis with Apache Giraph and Spark GraphX Roman Shaposhnik rvs@apache.org @rhatr Director of Open Source, Pivotal Inc.
  3. 3. Shameless plug #1
  4. 4. Shameless plug #1
  5. 5. Agenda:
  6. 6. Lets define some terms Graph is a G = (V, E), where E VxV Directed multigraphs with properties attached to each vertex and edgefoo bar fee
  7. 7. Lets define some terms Graph is a G = (V, E), where E VxV Directed multigraphs with properties attached to each vertex and edgefoo bar fee
  8. 8. Lets define some terms Graph is a G = (V, E), where E VxV Directed multigraphs with properties attached to each vertex and edgefoo bar fee 2 1
  9. 9. Lets define some terms Graph is a G = (V, E), where E VxV Directed multigraphs with properties attached to each vertex and edgefoo bar fee 2 1 foo bar fee 42 fum
  10. 10. What kind of graphs are we talking about? Page ranking on Facebook social graph (mid 2013) 10^9 (billions) vertices 10^12 (trillion) edges 10^15 (petabtybe) cold storage data scale 200 servers all in under 4 minutes!
  11. 11. On day one Doug created HDFS and MapReduce
  12. 12. Google papers that started it all GFS (le system) distributed replicated non-POSIX"MapReduce (computational framework) distributed batch-oriented (long jobs; nal results) data-gravity aware designed for embarrassingly parallel algorithms
  13. 13. HDFS pools and abstracts direct-attached storage HDFS MR MR
  14. 14. A Unix analogy It is as though instead of: $grepfoobar.txt|tr,|sort-u We are doing: $grepfoo/tmp/1.txt $tr,/tmp/2.txt $sortu