Engineering\CADD Systems Office CADD Manager's Series Customizing the Interface.
SD CADD meeting 2016-08-30: Intro to Spark
8
-
Upload
yana-valasatava -
Category
Data & Analytics
-
view
44 -
download
1
Transcript of SD CADD meeting 2016-08-30: Intro to Spark
Apache Spark is a fast and general engine for large-‐scale data processing • In-‐memory processing Successor of Hadoop (MapReduce) • File-‐based processing
hDp://spark.apache.org/
Apache Spark works in parallel on • Mul)core laptop, desktop • Single server • Cluster (need cluster manager)
RDD<String> RDD<String> PairRDD<String,Integer> PairRDD<String,Integer>
Map-‐Reduce Example
one to many one to one