Spark Scala project
-
Upload
utkarsh-jadhav -
Category
Engineering
-
view
111 -
download
0
Transcript of Spark Scala project
CS6240: Final Project
Predict sightings of the Red-winged Blackbird in Birding Checklists
Utkarsh JadhavSriharsha Srinivasa Karthik Kaipa
1
Table of Contents
● Overview and Approach
● Performance comparison ● Scope For Improvement
2
Overview and Approach
3
● Technologies used -○ Spark
■ MLLib - Machine Learning Library■ Scala - Functional Programming Approach
○ AWS EMR
● Approach for Classification○ Random Forest Classification (Ensemble Method)○ Why Ensemble?
● Advantages of Spark ○ Easy to write , Scala○ Concept - Partitioning , Repartitioning
Table of Contents
● Overview and Approach
● Performance comparison
● Scope For Improvement
4
Performance Comparision
5
● Total Execution Timeline
Performance Comparision
6
● Time per task
Performance Comparision
7
● Preprocessing performance scale-up
Performance Comparision
8
● Model training + testing performance scale-up
Performance Comparision
9
● Total performance scale-up
Table of Contents
● Overview and Approach
● Performance comparison
● Scope For Improvement
10
Scope For Improvement
11
● Emphasis on Data Mining Techniques○ Attribute Ranking○ Removal of bias, etc
● MLLib is black-box! Generalization is harmful!
Thank you!Questions?
12