Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing...

64
Data Science @ PMI Tools of The Trade Best Practices to Start, Develop and Ship a Data Science Product Maciej Marek AI Ukraine, 21 st September 2019 Kyiv

Transcript of Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing...

Page 1: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Data Science @ PMITools of The TradeBest Practices to Start, Develop and Ship a Data Science Product

Maciej Marek

AI Ukraine, 21st September 2019 Kyiv

Page 2: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

About me

• Now: Enterprise Data Scientist at PMI in Krakow

• Education: Computer Science

• Every day: Data Science best practices ambassador

• Likes: Motorbike trips, aviation

Page 3: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

PMI & Digital

Page 4: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Machine Learning & AI @Enterprise level

4

Challenge: fast delivery of reproducible models to productions!

Page 5: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

What is a Data Product?

• A software system with a ML/AI component, part of a large business system

• Formally defined, it is a system that:

• takes raw data as input,

• applies a machine-learned model to it,

• produces data as output

• Additionally, a data product must be

• dynamic and maintainable, allowing periodic updates

• Responsive, performant and scalable

5

Page 6: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Time to start shipping

6

Page 7: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Culture Conflict

• Companies structure DS divisions intoData Scientists – build models,Data Engineers – put models into productions,DevOps - maintain the platform

• Tendency to go into silos and do their owe thing.

• You hear things like ‘But it work on my machine’ ‘Here’s my Jupyter Notebook, please industrialize it’

• Everyone get caught in a vicious cycle of frustration

Page 8: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

8

SCRUM certified

Page 9: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

9

• We are part of PMI's Enterprise Analytics and Data (EAD) group

• 40+ Data Scientists across 4 hubs

• Offices in Amsterdam (NL), Kraków (PL), Lausanne (CH) and Tokyo (JP)

• Profiles

• Education: 30% PhD, 70% MSc/BSc

• Data Science Experience: 7.4 yrs on average

• Experience in PMI: 88% under 2yrs

• Expertise in Machine Learning, Big Data Engineering, Insights Communication

• SCRUM certified

Data Science @ PMI

Page 10: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

10

2012

OCT

Page 11: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

11

Thomas H. Davenport Dhanurjay Patiland

2012

OCT

Page 12: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

12

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

2012

OCT

Page 13: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

13

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

2012

OCT

Page 14: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

14

THE TRUTH

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

2012

OCT

Page 15: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

15

THE TRUTH?

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

2012

OCT

Page 16: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

16

THE TRUTH?

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

2012

OCT

Page 17: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

17

http://reportajede.news/wp-content/uploads/2016/09/Graduados.jpg

Page 18: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

18

http://reportajede.news/wp-content/uploads/2016/09/Graduados.jpg

Page 19: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

19

https://developers.google.com/machine-learning/crash-course/production-ml-systems

ML Code

Perception

Page 20: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

20

https://developers.google.com/machine-learning/crash-course/production-ml-systems

Reality

Page 21: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

21

https://developers.google.com/machine-learning/crash-course/production-ml-systems

Reality

Page 22: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

22

https://developers.google.com/machine-learning/crash-course/production-ml-systems

Reality

Page 23: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Machine Learning as a Service

Page 24: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

24

CONTROL EASE OF USE

Red

uce

d A

dm

inis

trat

ion

IaaS Clusters Insights as-a-service

Any Hadoop technology,Any distribution Machine Learning as a Service

Infrastructure @ Cloud

Page 25: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

25

CONTROL EASE OF USE

Red

uce

d A

dm

inis

trat

ion

IaaS Clusters PMI Data Ocean Insights as-a-service

Machine Learning as a Service

Infrastructure @ CloudAny Hadoop technology,Any distribution

Page 26: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

26

IoT/streaming data

Cloud storage

Data warehouses

Hadoop storage

Data Products

BI tools

Data exports

Data warehouses

Page 27: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

27

IoT/streaming data

Cloud storage

Data warehouses

Hadoop storage

Data Products

BI tools

Data exports

Data warehouses

Page 28: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

28

Flexible Inspection ready

Easy to use

Reproducible

To create a workflow that is …

Our Vision

Page 29: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

29

DS Prod Lab

Reproducible Containers

Data Product Architecture @PMI• The dots, connected.

F

F

F – Flexible

Data Read/Write

F

Page 30: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

30

DS Prod Lab

Scanned by BlackDuck

Reproducible Containers

Version Control

Data Product Architecture @PMI• The dots, connected.

F

F

I

I

F – FlexibleI – Inspection ready

Monitoring

Data Read/Write

F

I

Page 31: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

31

DS Prod Lab

Reproducible Containers

Data Product Architecture @PMI• The dots, connected.

F, R

F

F – FlexibleI – Inspection readyR – Reproducible

Scanned by BlackDuck

Version Control

I

I

Monitoring

Data Read/Write

F

I

Page 32: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

32

DS Prod Lab

AutomationOn-demand

infrastructure

Data Read/Write

Data ProductReproducible Containers

Data Product Architecture @PMI• The dots, connected.

F, R

F

F

E

EEF – FlexibleI – Inspection readyR – ReproducibleE – Easy to use

E

E

Scheduler

Scanned by BlackDuck

Version Control

I

I

Monitoring

I

Page 33: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

33

Data Science Best Practices @ PMI

Python

Style

Guides

Testing

Code

Reviews

Notebooks

to Modules

Docker CICDProject

Templates

Version

Control

Page 34: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

34

Data Science Best Practices @ PMI

Python

Style

Guides

Testing

Code

Reviews

Notebooks

to Modules

Docker CICDProject

Templates

Version

Control

Page 35: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

35

For Reproducibility

Docker Containers

Page 36: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

36

Docker for Containerized Data ScienceAll your dependencies in one place. Code guaranteed to run anywhere.

Freedom

Page 37: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

37

Docker for Containerized Data ScienceAll your dependencies in one place. Code guaranteed to run anywhere.

Freedom

Ease of installation

Page 38: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

38

Docker for Containerized Data ScienceAll your dependencies in one place. Code guaranteed to run anywhere.

Freedom

Ease of installation

Reproducibility

Page 39: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

39

Docker for Containerized Data ScienceAll your dependencies in one place. Code guaranteed to run anywhere.

Freedom

Ease of installation

Reproducibility

Isolation

Page 40: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

40

Docker for Containerized Data ScienceAll your dependencies in one place. Code guaranteed to run anywhere.

Freedom

Ease of installation

Reproducibility

Isolation

Speed

Page 41: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

41

Docker for Containerized Data ScienceAll your dependencies in one place. Code guaranteed to run anywhere.

>100GB

Freedom

Ease of installation

Reproducibility

Isolation

Speed

Page 42: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

42

For organization and predictability

Project Templates

Page 43: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

43

CookieCutterEverything has a place and a purpose

https://wallpapershome.com/images/wallpapers/holiday-cookies-3840x2160-baking-food-holiday-cookie-shape-409.jpg

Page 44: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

44

CookieCutterEverything has a place and a purpose

https://wallpapershome.com/images/wallpapers/holiday-cookies-3840x2160-baking-food-holiday-cookie-shape-409.jpg

Page 45: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

45

For integration and deployment

CICD for Data Science

Page 46: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

46

“It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.”

Page 47: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

47

“It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change.”

Charles Darwin

Page 48: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

48

Continuous Integration (CI), Delivery (CD) and Deployment

Development practices for overcoming integration challenges and moving faster to delivery

The CI/CD Cycle

Continuous Integration requires multiple developers to

integrate code into a shared repository frequently. Requested

merges are automatically tested and reviewed.

Enabled by git-flow, code standards and

automated testing

Page 49: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

49

Continuous Integration (CI), Delivery (CD) and Deployment

Development practices for overcoming integration challenges and moving faster to delivery

The CI/CD Cycle

Continuous Delivery makes sure that the code that we

integrate is always in a deploy-ready state.

Enabled by agile (iterative) methods,

testing and build automation

Page 50: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

50

Continuous Integration (CI), Delivery (CD) and Deployment

Development practices for overcoming integration challenges and moving faster to delivery

The CI/CD Cycle

Continuous Deployment is the actual act of pushing

updates out to the user – think of your iPhone apps or

Desktop browser that prompt for updates to be installed

periodically.

Page 51: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

51

Alpha Beta IndustrializationProject Phase

Continuous Integration (CI), Delivery (CD) and DeploymentFrom business needs to value

Generating valueUnderstanding business needs

Exploration phase Automation phase

Web-based application

Page 52: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

52

For monitoring

Kibana and Airflow

Page 53: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

53

Page 54: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

54https://docs.qubole.com/en/latest/user-guide/data-engineering/airflow/monitor-airflow-cluster.html

Page 55: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Continuous Technology Scouting

55

Page 56: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Best Practice Ambassador Network @ PMI

• Supporting technical assessment of tools, methods, etc.

• Preparing & conducting internal trainings

• Providing on-site support to fellow data scientists

• Facilitating retrospectives of other teams

• Actively participating in local meetups & conferences

• Designing & implementing data science specific solutions for improving the data science work effectiveness

Page 57: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

In Conclusion

Page 58: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Engineering smart systems around a machine-learned

core is difficult

It requires teams of exceptionally talented individuals to

work together.

What makes data scientists special is their ability to work

with both business leaders and technology experts.

We must acknowledge that we are a part of something

much bigger and learn to play well with each other and

with all parties involved.

Our hope is that these systems, principles and best

practices will help you take the first steps in that direction

Page 60: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Airflow

60

Architecture

Page 61: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Kubernetes

61

Infrastructure

Page 62: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Ocean platform

62

Infrastructure

Page 63: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Ocean platform

63

Capability view

Page 64: Data Science @ PMI Tools of The Trade · •Preparing & conducting internal trainings •Providing on-site support to fellow data scientists •Facilitating retrospectives of other

Ocean platform

64

Instantiation view