Apache Spark - Yandex

Click here to load reader

  • date post

    02-Nov-2021
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Apache Spark - Yandex

3

! Apache Solr based search engine e-commerce
! .
!
! →
5
9
! map-reduce ! 3
!
 
13
Whole program comparison
Whole program comparison
Transformations
def filter(f: T => Boolean): RDD[T] def map[U: ClassTag](f: T => U): RDD[U] def foreachPartition(f: Iterator[T] => Unit) def zipWithIndex(): RDD[(T, Long)]
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] def reduceByKey(func: (V,V) => V,numPartitions: Int):RDD[(K, V)]
In one partition(narrow)
Hadoop
Spark
Scala val file = spark.textFile("hdfs://...") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
18

19

20
SQL
23
25

! Classification and regression – linear support vector machine (SVM) – logistic regression – linear least squares, Lasso, and ridge regression – decision tree – naive Bayes ! Collaborative filtering
– alternating least squares (ALS)
– stochastic gradient descent – limited-memory BFGS (L-BFGS)
27
28
30
Batch   Interac+ve   Streaming  


API
34

200
400
600
800
1000
1200
1400
35