Profiling & Testing with Spark

download Profiling & Testing with Spark

of 51

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Profiling & Testing with Spark

  1. 1. Profiling & Testing with Spark Apache Spark 2.0 Improvements, Flame Graphs & Testing
  2. 2. Outline Overview Spark 2.0 Improvements Profiling with Flame Graphs How-to Flame Graphs Testing in Spark
  3. 3. Overview Apache Spark is a fast and general engine for large-scale data processing Speed: Runs in-memory computing, up to 100x faster than MapReduce Ease of Use: Support for Java, Scala, Python and R binding Generality: Enabled for SQL, Streaming and complex analytics (ML) Portable: Runs on Yarn, Mesos, standalone or Cloud
  4. 4. Overview (Big Picture)
  5. 5. Overview (architecture)
  6. 6. Overview (code sample) Monte-carlo calculation This code estimates by "throwing darts" at a circle. We pick random points in the unit square ((0, 0) to (1,1)) and see how many fall in the unit circle. The fraction should be / 4, so we use this to get our estimate.
  7. 7. Main Takeaway Spark SQL: Provides parallelism, affordable at scale Scale out SQL on storage for Big Data volumes Scale out on CPU for memory-intensive queries Offloading reports from RDBMS becomes attractive Spark 2.0 improvements: Considerable speedup of CPU-intensive queries
  8. 8. Spark 2.0 Improvements
  9. 9. SQL Queries sqlContext.sql(" SELECT a.bucket, sum(a.val2) tot FROM t1 a, t1 b WHERE a.bucket=b.bucket and a.val1+b.val1