Spark for big data analytics
Embed Size (px)
Transcript of Spark for big data analytics
Big Data Analytics with Spark
What will you learn today ?
What is Apache Spark ? How Spark fits into Hadoop Ecosystem ? Why Spark for Big Data Analytics ? Sparks popularity Hands-On : Analyzing data with Spark
Apache Spark is a general purpose data processing engine with in-memory computing Spark provides API for Scala, Java, Python and R which makes Spark widely adopted for data processing
How Spark fits into Hadoop Ecosystem ? Spark is intended to enhance, not replace, the Hadoop stack
Spark is designed to read and write data to HDFS as well as other storage systems such as CSV files, Amazon S3 and NoSQL databases
Why Spark for Big Data Analytics ?
What makes Spark suitable for Big Data Analytics ?
Why Spark for Big Data Analytics ?Following features make Spark, the best fit for Big Data Analytics :Spark simplifies data analysisSpark provides built-in libraries to do advanced analyticsSpark speaks more than one languageSpark provides faster resultsSpark allows you to use different Hadoop vendors
Word Count Problem - MapReduce
MapReduce Code for a Simple Word Count Problem
Word Count Problem - SparkSpark Scala Code for Word Count Problem Spark Python Code for Word Count Problem
Clearly processing data with Spark is much easier than MapReduce and Spark gives you the flexibility to choose your favorite language Scala, Java, Python etc.
Spark is blazingly Fast
Spark LibrariesSpark SQL : Sparks module for working with structured data
MLlib : Sparks machine learning library
GraphX : Sparks API for graph computation
Spark Streaming : Sparks API to process streaming data
Spark Multiple Language Support
Spark in one Snapshot
Spark Use Cases
Different companies are using Spark for solving various problems e.g. recommendation systems, business intelligence, fraud detection etc.
Who is using Spark?
A complete list of companies using Spark can be found here : https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
Spark is here to staySpark is not one of those "here today, gone tomorrow". Spark is here to stay for the foreseeable future, and it is well worth to get your teeth into it in order to get some value out of your data
Hands-onAnalyzing data with Spark
ReferencesIBM backs Apache Spark for Big Data Analytics : http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/
How eBay uses Spark to ignite Data Analytics : http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/
Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' :http://fortune.com/2015/09/09/cloudera-spark-mapreduce/
5 reasons to turn to Spark for Big Data Analytics : http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html
SurveyYour feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!
Please spare few minutes to take the survey after the webinar.
Recording and presentation will be made available to you within 24 hours