Lightning Talks & Integrations Track - Running Apache Spark Libraries on Apache Apex @ ABDW17, Pune
17
1
-
Upload
datatorrent -
Category
Technology
-
view
13 -
download
0
Transcript of Lightning Talks & Integrations Track - Running Apache Spark Libraries on Apache Apex @ ABDW17, Pune
2
• Motivation• Apex Processing Model• Spark Processing Model• Translation from Spark to Apex• Parallelism in Apex• I/O Performance Enhancement• RoadMap
3
4
5
6
7
val parsed = sc.textFile(path, minPartitions)
.map(_.trim)
.filter(line => !(line.isEmpty || line.startsWith("#")))
.map(training_record)
val d = parsed.reduce(math.Max + 1)
parsed.map(_+d).collect()
8
val parsed = sc.textFile(path, minPartitions)
.map(_.trim)
.filter(line => !(line.isEmpty || line.startsWith("#")))
.map(training_record)
Apex RDD
parsed
9
val d = parsed.reduce(math.Max + 1)
val d = nParsed
Apex RDD
10
parsed.map(_ + d).collect()
Parsed (ApexRDD)
11
Map
Map
Map
Map
Reduce
Reduce
12
13
14
15
16
17