Spark SQL and DataFrames Spark GraphX Spark Mlib Spark ... Spark GraphX! Spark Mlib! Spark Streaming

download Spark SQL and DataFrames Spark GraphX Spark Mlib Spark ... Spark GraphX! Spark Mlib! Spark Streaming

of 52

  • date post

    27-May-2020
  • Category

    Documents

  • view

    35
  • download

    1

Embed Size (px)

Transcript of Spark SQL and DataFrames Spark GraphX Spark Mlib Spark ... Spark GraphX! Spark Mlib! Spark Streaming

  • Spark SQL and DataFrames ��� Spark GraphX ��� Spark Mlib ��� Spark Streaming

    Lightning-fast cluster computing

  • Chaining transformations

    2  

  • SQL context

    3  

  • Creating a SQL context

    4  

  • DataFrames

    5  

  • Creating DataFrames

    6  

  • Creating a DataFrame from Hive

    7  

    Place your hive-site.xml, core-site.xml (for security configuration), hdfs-site.xml (for HDFS configuration) file in your spark conf/

  • Creating a DataFrame from MySQL

    8  

  • Creating a DataFrame from MySQL

    9  

  • Transforming and querying DataFrames

    10   https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#

  • Working data in a DataFrame

    11  

  • Working data in a DataFrame

    12  

  • DataFrame queries

    13  

  • DataFrame queries

    14  

  • DataFrame queries

    15  

  • Query DataFrame using columns

    16  

  • Query DataFrame using columns

    17  

  • SQL queries

    18  

  • Saving DataFrames

    19  

  • DataFrames and RDDs

    20  

  • DataFrames and RDDs

    21  

  • Working with Row objects

    22  

  • Extracting data from rows

    23  

  • Covert RDD to DataFrame

    24  

  • ML and GraphX in Spark

    25  

  • Common spark use case

    26  

  • Common spark use case

    27  

  • Spark examples

    28  

  • Iterative algorithms in Spark: PageRank

    29  

  • PageRank algorithm

    30  

  • PageRank algorithm

    31  

  • PageRank algorithm

    32  

  • PageRank algorithm

    33  

  • Neighbor contribution function

    34  

  • Input data

    35  

  • Pairs of page links

    36  

  • Page links grouped by source page

    37  

  • Persisting the link pair RDD

    38  

  • Set initial ranks

    39  

  • First iteration

    40  

  • First iteration

    41  

  • First iteration

    42  

  • First iteration

    43  

  • Second iteration

    44  

  • Checking point

    45  

  • Checking point

    46  

  • GraphX in Spark

    47  

  • Examples in GraphX

    48  

  • MLlib in Spark���

    49  

    https://spark.apache.org/docs/2.0.2/ml-guide.html

  • What is MLlib?

    50  

  • Why MLlib?

    51  

    https://docs.databricks.com/spark/latest/mllib/decision-trees.html

  • Spark streaming

    52  http://spark.apache.org/docs/latest/streaming-programming-guide.html