Apache Spark«¤„¦

download Apache Spark«¤„¦

of 27

Embed Size (px)

Transcript of Apache Spark«¤„¦

1. Copyright BrainPad Inc. All Rights Reserved. Apache Spark 2015515 2. Copyright BrainPad Inc. All Rights Reserved. 1. Spark 2 3. Copyright BrainPad Inc. All Rights Reserved. Apache Spark is a fast and general-purpose cluster computing system. MapReduce10100 2009UC Berkley 2010 2013ASF 2014Apache Top-Level Project 2015/5/11.3.1Scala Scala,Java,Python 3 Spark 4. Copyright BrainPad Inc. All Rights Reserved. SparkMapReducemapfilter IO RDD SparkRDDmapreduce HDFS map RDD 4 5. Copyright BrainPad Inc. All Rights Reserved. SparkSpark Core Spark SQLSQL Spark StreamingSpark MLlibSpark GraphX Spark Core 5 6. Copyright BrainPad Inc. All Rights Reserved. SparkBDASBerkeley Data Analytics Stack BDASAMPlab Mesos 6 7. Copyright BrainPad Inc. All Rights Reserved. 2. RDD 7 8. Copyright BrainPad Inc. All Rights Reserved. SparkRDD RDD countsaveAsTextFile RDD RDD 8 RDD 9. Copyright BrainPad Inc. All Rights Reserved. RDD2 1. val data = Array(1, 2, 3, 4, 5) val distData = sc.parallelize(data) 2. val distFile = sc.textFile("data.txt") 9 RDD 10. Copyright BrainPad Inc. All Rights Reserved. 1. Transformation RDDRDDAction mapfiltergroupByKey 2. Action DriverAction counttakesaveAsTextFile 10 RDD 11. Copyright BrainPad Inc. All Rights Reserved. RDD RDD HDFSRDD RDD TransformationRDD 11 RDD RDD RDD RDD transformation transformation 12. Copyright BrainPad Inc. All Rights Reserved. // 1.HDFS val lines = sc.textFile(hdfs://) // 2.target val targetLines = lines.filter(_.contains(target)) // 3. val firstWords = targetLines.map(_.split( )(0)) // 4.HDFS firstWords.saveAsTextFile(hdfs://) 12 RDD 4Action 13. Copyright BrainPad Inc. All Rights Reserved. // firstWords.saveAsTextFile(hdfs://) println(firstWords.count()) 13 RDD firstWords2Spark firstWords.cache() firstWords.saveAsTextFile(hdfs://) println(firstWords.count()) firstWords 14. Copyright BrainPad Inc. All Rights Reserved. 14 RDD filter map saveAsTextFileHDFS filter map countHDFS 2 HDFS2 filter map saveAsTextFileHDFS count map cache 15. Copyright BrainPad Inc. All Rights Reserved. 3. 15 16. Copyright BrainPad Inc. All Rights Reserved. 16 Spark Driver Executor Driver task executor Actionexecutor main JVMdriverSpark Executor drivertask RDD SparkJVM YARNMesos Executor Executor 17. Copyright BrainPad Inc. All Rights Reserved. 1. transformationRDD 2. action 3. DriverRDDStageDAG Stage Task ShuffleStage Task executor 4. ExecutorStage 17 Driver 18. Copyright BrainPad Inc. All Rights Reserved. Shuffle reduceByKey executorexecutor 18 Shuffle RDD RDD RDD RDD RDD RDD RDD RDD RDD RDD RDD RDD map 19. Copyright BrainPad Inc. All Rights Reserved. StageShuffle joinShuffle Stage1mapfilter2 mapfilter 1Stage pipelining Stagetask executor RDDRDD union 19 Stage map filter map join Stage1 Stage2 Stage3 20. Copyright BrainPad Inc. All Rights Reserved. 4. Spark 20 21. Copyright BrainPad Inc. All Rights Reserved. Spark RDD RDDDStream Spark StreamingRDD Spark 21 Spark Streaming RDD RDD RDD 5 5 5 5 DSteam 22. Copyright BrainPad Inc. All Rights Reserved. Spark SQL RDDSchemaRDDSQL JSONParquetHive DataFrame RPandasPython Spark1.3 Catalyst Spark SQLDataFrame databricks https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst- optimizer.html 22 Spark SQLDataFrame 23. Copyright BrainPad Inc. All Rights Reserved. Spark SVM K-means ALS Spark Streaming Streaming K-means 23 MLlib 24. Copyright BrainPad Inc. All Rights Reserved. 5. 24 25. Copyright BrainPad Inc. All Rights Reserved. SparkSpark Core Scala,Java,PythonSpark spark shell SparkRDD Spark StreamingMLlib Learning SparkAdvanced Analytics with Spark StrataSpark 25 26. Copyright BrainPad Inc. All Rights Reserved. Spark https://spark.apache.org/docs/latest/index.html Learning Spark http://shop.oreilly.com/product/0636920028512.do Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf 26 27. Copyright BrainPad Inc. All Rights Reserved. 108-0071 3-2-10 3F TEL03-6721-7001 FAX03-6721-7010 info@brainpad.co.jp Copyright BrainPad Inc. All Rights Reserved. www.brainpad.co.jp