Apache spark

download Apache spark

of 21

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Apache spark

  1. 1. www.edureka.co/r-for-analytics www.edureka.co/apache-spark-scala-training Apache Spark: Beyond Hadoop MapReduce
  2. 2. Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training Agenda At the end of this webinar you will be able to know about: Strength of MapReduce Things beyond MapReduce How MapReduce limitations can be overcome How Spark fits the bill Other exciting features in Spark
  3. 3. Slide 3Slide 3Slide 3 www.edureka.co/apache-spark-scala-training Strength of MapReduce
  4. 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training Simple Scalability Fault Tolerance Minimal data motion Strength of MapReduce Independence of language of choice, such as Java, C++ or Python. process petabytes of data, stored in HDFS on one cl MapReduce takes care of failures using the replicated copies. Process moves towards data to minimize disk I/O
  5. 5. Slide 5Slide 5Slide 5 www.edureka.co/apache-spark-scala-training Limitations Of MapReduce (MR)
  6. 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training Real Time Complex Algorithm Re-reading And parsing Data Minimal Data Motion Graph Processing Iterative Tasks Random Access Limitations Of MR
  7. 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training Feature Comparison with Spark Fast 100x faster than MapReduce Batch Processing Batch and Real-time Processing Stores Data on Disk Stores Data in Memory Written in Java Written in Scala Hadoop MapReduce HADOOP Spark Source: Databrix
  8. 8. Slide 8Slide 8Slide 8 www.edureka.co/apache-spark-scala-training How MR limitations can be overcome
  9. 9. Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cutting down on the number of reads and writes to the disc Real time
  10. 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Libraries for Machine learning, Streaming Graph processing complex algorithm
  11. 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cyclic data flows Random access
  12. 12. Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training How Spark Implements Features To Make Its Architecture Better Than MR
  13. 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency computations, whereas MapReduce keeps shuffling things in and out of disk. Sparks Cuts Down Read/Write I/O To Disk
  14. 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training Libraries For ML, Graph Programming Machine Learning Library Graph programming Spark interface For RDBMS lovers Utility for continues ingestion of data
  15. 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training Cyclic Data Flows All jobs in spark comprise a series of operators and run on a set of data. All the operators in a job are used to construct a DAG (Directed Acyclic Graph). The DAG is optimized by rearranging and combining operators where possible.
  16. 16. Slide 16Slide 16Slide 16 www.edureka.co/apache-spark-scala-training Spark Other Features In Demand
  17. 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training Spark Features/Modules In Demand Source: Typesafe
  18. 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training New Features In 2015 Data Frames Similar API to data frames in R and Pandas Automatically optimised via Spark SQL Released in Spark 1.3 SparkR Released in Spark 1.4 Exposes DataFrames, RDDs & ML library in R Machine Learning Pipelines High Level API Featurization Evaluation Model Tuning External Data Sources Platform API to plug Data-Sources into Spark Pushes logic into sources Source: Databrix
  19. 19. Questions Slide 19
  20. 20. Slide 20 Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar. Survey