Spark Scala project

download Spark Scala project

of 12

Embed Size (px)

Transcript of Spark Scala project

  • CS6240: Final Project

    Predict sightings of the Red-winged Blackbird in Birding Checklists

    Utkarsh JadhavSriharsha Srinivasa Karthik Kaipa

    1

  • Table of Contents

    Overview and Approach

    Performance comparison Scope For Improvement

    2

  • Overview and Approach

    3

    Technologies used - Spark

    MLLib - Machine Learning Library Scala - Functional Programming Approach

    AWS EMR

    Approach for Classification Random Forest Classification (Ensemble Method) Why Ensemble?

    Advantages of Spark Easy to write , Scala Concept - Partitioning , Repartitioning

  • Table of Contents

    Overview and Approach

    Performance comparison

    Scope For Improvement

    4

  • Performance Comparision

    5

    Total Execution Timeline

  • Performance Comparision

    6

    Time per task

  • Performance Comparision

    7

    Preprocessing performance scale-up

  • Performance Comparision

    8

    Model training + testing performance scale-up

  • Performance Comparision

    9

    Total performance scale-up

  • Table of Contents

    Overview and Approach

    Performance comparison

    Scope For Improvement

    10

  • Scope For Improvement

    11

    Emphasis on Data Mining Techniques Attribute Ranking Removal of bias, etc

    MLLib is black-box! Generalization is harmful!

  • Thank you!Questions?

    12