AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team...

35
Vinnie Saini Data & AI Solution Architect Microsoft, Canada [email protected] AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption

Transcript of AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team...

Page 1: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Vinnie Saini

Data & AI Solution Architect

Microsoft, Canada

[email protected]

AI Development Lifecycle and Team Data Science Process (TDSP)

Objectives, Components and Adoption

Page 2: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

2017 Annual Report

Page 3: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

What is AI (Artificial Intelligence)?

Page 4: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

What is Machine Learning?

Page 5: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Supervised Learning

• examples of correct input-output pairs

• human intervention to classify the images in the training set.

• 2subgroups: regression and classification

Unsupervised Learning

• no label or output used to train the machine

• machine is trained to identify hidden patterns or segments.

• Clustering

• Generative Modeling--imitate the process that generates the training data

Reinforced Learning

• constantly learning system which incentivizes an algorithm for meeting the final goals under the given constraints.

• we do not provide the machine with examples of correct input-output pairs

• We do provide a method for the machine to quantify its performance in the form of a reward signal.

Page 6: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

What is Deep Learning?

Page 7: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Train a text sentiment classification engine

Page 8: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Predictive ML Experiment -

Twitter sentiment

analysis

Page 9: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Apps + insightsSocial

LOB

Graph

IoT

Image

CRM INGEST STORE PREP & TRAIN MODEL & SERVE

Data orchestration and monitoring

Data lake and storage

Hadoop/Spark/SQL and ML

.

IoT

Azure Machine Learning

Page 10: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Opportunity and challenge of data science in enterprises

• Opportunity: 17% had a well-developed Predictive/Prescriptive Analytics program in place, while 80% planned on implementing such a program within five years – Dataversity 2015 Survey

• Challenge: Only 27% of the big data projects are regarded as successful –CapGenimi 2014

Tools & data platforms have matured -Still a major gap in executing on the potential

Page 11: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Process challenge in Data Science

o“Intelligent” application (ML/AI) development has unique complexity not always encountered in other Software Development scenarios

Organization

Collaboration

Quality

Knowledge Accumulation

Agility

Global Teams

•Geographic Locations

Team Growth

•Onboard New Members Rapidly

Varied Use Cases

• Industries and Use Cases

Diverse DS Backgrounds

•DS have diverse backgrounds, experiences with tools, languages

Page 12: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Why is a process useful?

A process is a detailed sequence of activities necessary to perform specific business tasks

It is used to standardize procedures and establish best practices

Technology and tools are changing rapidly. A standardized process can provide continuity and stability of work-flow.- Based on discussions with Luis Morinigo, Dir. IoT, NewSignature

Page 13: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Team Data Science ProcessRecommended lifecycle that you can use to structure your data-science projects

Page 14: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

TDSP components for data science teams

Organization

Collaboration

Quality

Knowledge Accumulation

Agility

Standardized Data Science Lifecycle

Project Structure, Templates & Roles

Infrastructure

Re-usable Data Science Utilities

TDSP Components Data Science Challenges

Page 15: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

TDSP Project Structure, and Documents and Artifact Templates

A general project directory structure for Team Data Science Process developed by Microsoft

Page 16: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

TDSP lifecycle stages can be integrated with specific deliverables & checkpoints

Business Understanding

• Project Objective

• Data, Target & Feature Definition

• Data Dictionary

Data acquisition and understanding

Modeling Deployment

Page 17: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Project roles & tasks

• Governance and Project Management

• Data Science and Engineering

Page 18: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Tracking progress with Power BI dashboardsPower BI content pack for VSTS: tool for PM

Page 19: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Execution of data science projectsusing TDSP

Setting up a TDSP team environment using Visual Studio Team Services

Page 20: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Re-usable data science utilities: Analytics Interactive data exploration and reporting – IDEAR (Python, R, MRS)

o Data quality assessment

o Getting business insights from the data

o Association between variables

o Generating standardized data quality reports automatically

Clustering Distribution assessment

https://github.com/Azure/Azure-TDSP-Utilities

Page 21: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Adoption: How to stage (as needed)Data science teams may stage adoption as follows

Leve

l 1

- One git repository per project

- Standard directory structure

- Standardized templates like charter, exit reports

- Planning and tracking of work items

Leve

l 2

- Customize templates to fit team needs

- Create shared team utility repo (like IDEAR, AMAR)

Leve

3 - Develop process to graduate code from projects to the shared team utility repo

- Develop E2E worked-out templates

- Use mature work planning and tracking system (e.g. Agile)

Leve

l 4

- Link git branch with work items

- Code review

- Manage and version model and data assets

- Develop automated testing framework

Page 22: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Services

Infrastructure

Tools

Microsoft AI Platform

Page 23: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building
Page 24: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Azure Machine Learning Studio

Platform for emerging data scientists to

graphically build and deploy experiments

• Rapid experiment composition

• > 100 easily configured modules for

data prep, training, evaluation

• Extensibility through R & Python

• Serverless training and deployment

Some numbers:

• 100’s of thousands of deployed models

serving billions of requests

Page 25: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Accelerating adoption of AI by developers

(consuming models)

Rise of hybrid training and scoring scenarios

Push scoring/inference to the event (edge,

cloud, on-prem)

Some developers moving into deep learning as

non-traditional path to DS / AI dev

Growth of diverse hardware arms race across all

form factors (CPU / GPU / FPGA / ASIC /

device)

Data prep

Model deployment &

management

Model lineage & auditing

Explain-ability

Data science & AIC H A L L E N G E SK E Y T R E N D S

Page 26: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

What have we learned?

• Customers have told us they love the convenience

• Customers have told us they need:

• Greater control over compute & data

• More options for model deployment

• Which frameworks? ALL OF THEM!

Page 27: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Key Goals for Preview Features

Begin building now with the tools and platforms you know

Build, deploy, and

manage models at

scale

Boost productivity with

agile development

Page 28: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Machine Learning ServicesBring AI to everyone with an end-to-end, scalable, trusted platform

Page 29: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Local machine

Scale up to DSVM

Scale out with Spark on HDInsight

Azure Batch AI (Coming Soon)

ML Server

Experiment Everywhere

A ZURE ML

EXPER IMENTAT ION

Command line tools

IDEs

Notebooks in Workbench

VS Code Tools for AI

Page 30: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

DOCKER

Single node deployment (cloud/on-prem)

Azure Container Service

Azure IoT Edge

Microsoft ML Server

Spark clusters

SQL Server

Deploy Everywhere

A ZURE ML

MODEL MANAGEMENT

Page 31: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Deployment and management of models as HTTP

services

Container-based hosting of real time and batch

processing made really simple

Management and monitoring through Azure

Application Insights

First class support for SparkML, Python, Cognitive

Toolkit, TF, R, extensible to support others (Caffe,

MXnet)

Service authoring in Python

Manage models

Page 32: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Experimentation and Model Management services in conjunction

• Governance and Lineage of deployed models

• Visibility into any decision and tracing it back if required

• Debugging and Diagnostics story across the end to end lifecycle of a model.

Page 33: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

Instantiating TDSP Structure and Templates From the Azure Machine Learning Template Gallery

Page 34: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

blogs.msdn.microsoft.com/buckwoody/tag/team-data-science-process/

blogs.msdn.microsoft.com/buckwoody/category/devops-for-data-science/

Page 35: AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team Data Science Process (TDSP) Objectives, Components and Adoption ... Begin building

The future of AI is here