1 Cloudera, Inc. All rights reserved.
MapReduce Spark Hadoop Spark The One Platform Initiative
Doug Cutting | | Cloudera@cutting
2 Cloudera, Inc. All rights reserved.
Apache Spark
Spark
MapReduce Spark
One Platform Initiative
Hadoop
3 Cloudera, Inc. All rights reserved.
MapReduce ...
/
MapReduce
Hive Pig Mahout SolrCrunch
4 Cloudera, Inc. All rights reserved.
...
: Giraph/Graphlab () Impala ( SQL)
MapReduce
:Hama () Dryad (Arbitrary DAG)
5 Cloudera, Inc. All rights reserved.
Apache Spark
MapReduce
(Full Directed Graph expressions)
:
6 Cloudera, Inc. All rights reserved.
Apache SparkHadoop
API
Scala,Java,Python API
API
7 Cloudera, Inc. All rights reserved.
API Scala, Java, Python
2~5
Python lines = sc.textFile(...) lines.filter(lambda s: ERROR in s).count()
Scala val lines = sc.textFile(...) lines.filter(s => s.contains(ERROR)).count()
Java JavaRDD lines = sc.textFile(...); lines.filter(new Function() { Boolean call(String s) { return s.contains(error); } }).count();
8 Cloudera, Inc. All rights reserved.
percolateur:spark srowen$ ./bin/spark-shell --master local[*]...Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.0-SNAPSHOT /_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_51)Type in expressions to have them evaluated.Type :help for more information....
scala> val words = sc.textFile("file:/usr/share/dict/words")...words: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21
scala> words.count...res0: Long = 235886
scala>
9 Cloudera, Inc. All rights reserved.
Spark
RDDResilient Distributed Dataset)
10 Cloudera, Inc. All rights reserved.
Spark Hadoop
Spark Streaming MLlib SparkSQL GraphX Data-frames SparkR
HDFS, HBase
YARN
Spark Impala MR OthersSearch
11 Cloudera, Inc. All rights reserved.
Cloudera Spark
2013 2014 2015 2016
Spark
CDH 4.4 Spark
YARN Spark
Spark
Spark
Cloudera OReilly Spark
12 Cloudera, Inc. All rights reserved.
Cloudera Spark ClouderaSpark Hadoop SparkCloudera
Cloudera Spark Hadoop
Cloudera 25
Spark
13 Cloudera, Inc. All rights reserved.
Cloudera Spark
Cloudera67%
Intel17%
Hortonworks17%
Hadoop Spark *
IBM MapR
Hadoop
Cloudera, 370 Hortonworks, 4 IBM, 12 MapR, 1 Intel, 400
14 Cloudera, Inc. All rights reserved.
Cloudera
Spark 150 800 Spark
15 Cloudera, Inc. All rights reserved.
Cloudera Core Spark Spark Streaming
ETL 20
Jaccard
ERP
(LDA)
1010
16 Cloudera, Inc. All rights reserved.
Spark MapReduce Hadoop
17 Cloudera, Inc. All rights reserved.
Spark MapReduce
1
Crunch on SparkSearch on Spark
2
Hive on Spark (beta)Spark on HBase (beta)
3
Pig on Spark (alpha)Sqoop on Spark
Cloudera Spark
18 Cloudera, Inc. All rights reserved.
Spark Hadoop One Platform Initiative
Hadoop
Hadoop
1
80%
19 Cloudera, Inc. All rights reserved.
Hadoop Spark
Spark
Impala
Low-Latency
Solr
MapReduce I/O
:
20 Cloudera, Inc. All rights reserved.
Cloudera
Hadoop 1
Cloudera
21 Cloudera, Inc. All rights reserved.
Spark Spark OReilly Advanced Analytics with Spark eBook (Cloudera) Cloudera Developer Blog cloudera.com/spark
Cloudera Spark Training
Cloudera Live Spark Tutorial
22 Cloudera, Inc. All rights reserved.
@cuMng
Top Related