Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem
-
Upload
f789gh -
Category
Data & Analytics
-
view
19 -
download
2
Transcript of Discovering Data Science Design Patterns with Examples from R and Python Software Ecosystem
Discovering Data Science Design Patterns
with Examples from R and Python
Dmitrij Petrov
Autumn 2017
30/11/2017 1Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Outlining Master Thesis
Motivation• Design patterns capture best solutions to recurring issues in
• Architecture• Started the Pattern Language Movement
• Object-Oriented Programming• Seminal work for software analysis, design and implementation
• Cloud Computing, Database Modelling, etc.
• Data Science
30/11/2017
Research Questions
• RQ1: What exactly does software ecosystem, data science and design pattern mean?
• RQ2: Which data science-oriented design patterns can be recognized?
• RQ3: What are the specific FOSS R and Python tools that can be used for solving common data mining problems?
30/11/2017 3Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Methodology – 3D2P framework
Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
Pattern prospecting
Pattern mining Pattern writing
- Literature Sources- General Inductive Approach &
Open/Axial Coding
- Discovery of patterns (i.e. best practises and their relationships)
Relevant works of: Thomas (‘06), Inventado & Scupelli (‘15), Meszaros & Doble (‘96)
- Follow PW guidelines for their documentation
A Pattern Example – “Build Me Dataset”
“Build Me Dataset”
Dmitrij Petrov - Master Thesis Presentation - Autumn 2017
1. Pattern Name & Sketch2. Context: you want to process data from multiple data sources/formats
3. Problem: extracting/storing data in a common data structure
4. Solution: “table” “data frame”
5. Consequences: can be very simple but also slow
6. Known uses: modelling, visualization…
7. Examples: from R & Python ecosystem
30/11/2017 5
Expected Outcomes
1. Aim to formulate Data Science design patterns
2. Data Science R and Python Toolkit Matrix• A holistic map of tools can simplify knowledge discovery process
30/11/2017 6Dmitrij Petrov - Master Thesis Presentation - Autumn 2017