Spark Scala project

Post on 13-Apr-2017

111 views 0 download

Transcript of Spark Scala project

CS6240: Final Project

Predict sightings of the Red-winged Blackbird in Birding Checklists

Utkarsh JadhavSriharsha Srinivasa Karthik Kaipa

1

Table of Contents

● Overview and Approach

● Performance comparison ● Scope For Improvement

2

Overview and Approach

3

● Technologies used -○ Spark

■ MLLib - Machine Learning Library■ Scala - Functional Programming Approach

○ AWS EMR

● Approach for Classification○ Random Forest Classification (Ensemble Method)○ Why Ensemble?

● Advantages of Spark ○ Easy to write , Scala○ Concept - Partitioning , Repartitioning

Table of Contents

● Overview and Approach

● Performance comparison

● Scope For Improvement

4

Performance Comparision

5

● Total Execution Timeline

Performance Comparision

6

● Time per task

Performance Comparision

7

● Preprocessing performance scale-up

Performance Comparision

8

● Model training + testing performance scale-up

Performance Comparision

9

● Total performance scale-up

Table of Contents

● Overview and Approach

● Performance comparison

● Scope For Improvement

10

Scope For Improvement

11

● Emphasis on Data Mining Techniques○ Attribute Ranking○ Removal of bias, etc

● MLLib is black-box! Generalization is harmful!

Thank you!Questions?

12