AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team...
Transcript of AI Development Lifecycle and Team Data Science Process (TDSP) · AI Development Lifecycle and Team...
Vinnie Saini
Data & AI Solution Architect
Microsoft, Canada
AI Development Lifecycle and Team Data Science Process (TDSP)
Objectives, Components and Adoption
2017 Annual Report
What is AI (Artificial Intelligence)?
What is Machine Learning?
Supervised Learning
• examples of correct input-output pairs
• human intervention to classify the images in the training set.
• 2subgroups: regression and classification
Unsupervised Learning
• no label or output used to train the machine
• machine is trained to identify hidden patterns or segments.
• Clustering
• Generative Modeling--imitate the process that generates the training data
Reinforced Learning
• constantly learning system which incentivizes an algorithm for meeting the final goals under the given constraints.
• we do not provide the machine with examples of correct input-output pairs
• We do provide a method for the machine to quantify its performance in the form of a reward signal.
What is Deep Learning?
Train a text sentiment classification engine
Predictive ML Experiment -
Twitter sentiment
analysis
Apps + insightsSocial
LOB
Graph
IoT
Image
CRM INGEST STORE PREP & TRAIN MODEL & SERVE
Data orchestration and monitoring
Data lake and storage
Hadoop/Spark/SQL and ML
.
IoT
Azure Machine Learning
Opportunity and challenge of data science in enterprises
• Opportunity: 17% had a well-developed Predictive/Prescriptive Analytics program in place, while 80% planned on implementing such a program within five years – Dataversity 2015 Survey
• Challenge: Only 27% of the big data projects are regarded as successful –CapGenimi 2014
Tools & data platforms have matured -Still a major gap in executing on the potential
Process challenge in Data Science
o“Intelligent” application (ML/AI) development has unique complexity not always encountered in other Software Development scenarios
Organization
Collaboration
Quality
Knowledge Accumulation
Agility
Global Teams
•Geographic Locations
Team Growth
•Onboard New Members Rapidly
Varied Use Cases
• Industries and Use Cases
Diverse DS Backgrounds
•DS have diverse backgrounds, experiences with tools, languages
Why is a process useful?
A process is a detailed sequence of activities necessary to perform specific business tasks
It is used to standardize procedures and establish best practices
Technology and tools are changing rapidly. A standardized process can provide continuity and stability of work-flow.- Based on discussions with Luis Morinigo, Dir. IoT, NewSignature
Team Data Science ProcessRecommended lifecycle that you can use to structure your data-science projects
TDSP components for data science teams
Organization
Collaboration
Quality
Knowledge Accumulation
Agility
Standardized Data Science Lifecycle
Project Structure, Templates & Roles
Infrastructure
Re-usable Data Science Utilities
TDSP Components Data Science Challenges
TDSP Project Structure, and Documents and Artifact Templates
A general project directory structure for Team Data Science Process developed by Microsoft
TDSP lifecycle stages can be integrated with specific deliverables & checkpoints
Business Understanding
• Project Objective
• Data, Target & Feature Definition
• Data Dictionary
Data acquisition and understanding
Modeling Deployment
Project roles & tasks
• Governance and Project Management
• Data Science and Engineering
Tracking progress with Power BI dashboardsPower BI content pack for VSTS: tool for PM
Execution of data science projectsusing TDSP
Setting up a TDSP team environment using Visual Studio Team Services
Re-usable data science utilities: Analytics Interactive data exploration and reporting – IDEAR (Python, R, MRS)
o Data quality assessment
o Getting business insights from the data
o Association between variables
o Generating standardized data quality reports automatically
Clustering Distribution assessment
https://github.com/Azure/Azure-TDSP-Utilities
Adoption: How to stage (as needed)Data science teams may stage adoption as follows
Leve
l 1
- One git repository per project
- Standard directory structure
- Standardized templates like charter, exit reports
- Planning and tracking of work items
Leve
l 2
- Customize templates to fit team needs
- Create shared team utility repo (like IDEAR, AMAR)
Leve
3 - Develop process to graduate code from projects to the shared team utility repo
- Develop E2E worked-out templates
- Use mature work planning and tracking system (e.g. Agile)
Leve
l 4
- Link git branch with work items
- Code review
- Manage and version model and data assets
- Develop automated testing framework
Services
Infrastructure
Tools
Microsoft AI Platform
Azure Machine Learning Studio
Platform for emerging data scientists to
graphically build and deploy experiments
• Rapid experiment composition
• > 100 easily configured modules for
data prep, training, evaluation
• Extensibility through R & Python
• Serverless training and deployment
Some numbers:
• 100’s of thousands of deployed models
serving billions of requests
Accelerating adoption of AI by developers
(consuming models)
Rise of hybrid training and scoring scenarios
Push scoring/inference to the event (edge,
cloud, on-prem)
Some developers moving into deep learning as
non-traditional path to DS / AI dev
Growth of diverse hardware arms race across all
form factors (CPU / GPU / FPGA / ASIC /
device)
Data prep
Model deployment &
management
Model lineage & auditing
Explain-ability
Data science & AIC H A L L E N G E SK E Y T R E N D S
What have we learned?
• Customers have told us they love the convenience
• Customers have told us they need:
• Greater control over compute & data
• More options for model deployment
• Which frameworks? ALL OF THEM!
Key Goals for Preview Features
Begin building now with the tools and platforms you know
Build, deploy, and
manage models at
scale
Boost productivity with
agile development
Machine Learning ServicesBring AI to everyone with an end-to-end, scalable, trusted platform
Local machine
Scale up to DSVM
Scale out with Spark on HDInsight
Azure Batch AI (Coming Soon)
ML Server
Experiment Everywhere
A ZURE ML
EXPER IMENTAT ION
Command line tools
IDEs
Notebooks in Workbench
VS Code Tools for AI
DOCKER
Single node deployment (cloud/on-prem)
Azure Container Service
Azure IoT Edge
Microsoft ML Server
Spark clusters
SQL Server
Deploy Everywhere
A ZURE ML
MODEL MANAGEMENT
Deployment and management of models as HTTP
services
Container-based hosting of real time and batch
processing made really simple
Management and monitoring through Azure
Application Insights
First class support for SparkML, Python, Cognitive
Toolkit, TF, R, extensible to support others (Caffe,
MXnet)
Service authoring in Python
Manage models
Experimentation and Model Management services in conjunction
• Governance and Lineage of deployed models
• Visibility into any decision and tracing it back if required
• Debugging and Diagnostics story across the end to end lifecycle of a model.
Instantiating TDSP Structure and Templates From the Azure Machine Learning Template Gallery
blogs.msdn.microsoft.com/buckwoody/tag/team-data-science-process/
blogs.msdn.microsoft.com/buckwoody/category/devops-for-data-science/
The future of AI is here