Spark Scala project

12
CS6240: Final Project Predict sightings of the Red-winged Blackbird in Birding Checklists Utkarsh Jadhav Sriharsha Srinivasa Karthik Kaipa 1

Transcript of Spark Scala project

Page 1: Spark Scala project

CS6240: Final Project

Predict sightings of the Red-winged Blackbird in Birding Checklists

Utkarsh JadhavSriharsha Srinivasa Karthik Kaipa

1

Page 2: Spark Scala project

Table of Contents

● Overview and Approach

● Performance comparison ● Scope For Improvement

2

Page 3: Spark Scala project

Overview and Approach

3

● Technologies used -○ Spark

■ MLLib - Machine Learning Library■ Scala - Functional Programming Approach

○ AWS EMR

● Approach for Classification○ Random Forest Classification (Ensemble Method)○ Why Ensemble?

● Advantages of Spark ○ Easy to write , Scala○ Concept - Partitioning , Repartitioning

Page 4: Spark Scala project

Table of Contents

● Overview and Approach

● Performance comparison

● Scope For Improvement

4

Page 5: Spark Scala project

Performance Comparision

5

● Total Execution Timeline

Page 6: Spark Scala project

Performance Comparision

6

● Time per task

Page 7: Spark Scala project

Performance Comparision

7

● Preprocessing performance scale-up

Page 8: Spark Scala project

Performance Comparision

8

● Model training + testing performance scale-up

Page 9: Spark Scala project

Performance Comparision

9

● Total performance scale-up

Page 10: Spark Scala project

Table of Contents

● Overview and Approach

● Performance comparison

● Scope For Improvement

10

Page 11: Spark Scala project

Scope For Improvement

11

● Emphasis on Data Mining Techniques○ Attribute Ranking○ Removal of bias, etc

● MLLib is black-box! Generalization is harmful!

Page 12: Spark Scala project

Thank you!Questions?

12