8/6/2019 Model Lifecycle
1/30
Model LifecycleAjit Ghanekar
8/6/2019 Model Lifecycle
2/30
Model Life Cycle
Model
Development
ModelValidation
ModelAssessment
Model
Monitoring
8/6/2019 Model Lifecycle
3/30
Model Development
8/6/2019 Model Lifecycle
4/30
Model Development - Process
Understanding of
Business Pains
and
Available Data
Identification of
Objective
and
Expected Outcome
Formulation of
Modeling Approach
and
Data Requirement
Identification ofAnalysis Tool
and
I/O Requirement
8/6/2019 Model Lifecycle
5/30
Model Development -Difficulties
Voluminous Data
Missing Data Elements
Lack of Data Insight
Inter-Correlated Characteristics
& Many More
8/6/2019 Model Lifecycle
6/30
Model Development SEMMA Methodology
Sample Explore Modify Model Assess
8/6/2019 Model Lifecycle
7/30
Rationale
Manageable Data for Model Development
Suppose to Represent Population.
Enough to Develop model on Sample
Model Developed on Sample Valid for Population
Sample
8/6/2019 Model Lifecycle
8/30
Techniques
Popular Sampling Techniques
Simple Radom Sampling
With Replacement (SRSWR)
Without Replacement(SRSWOR)
Stratified Sampling
Sample
8/6/2019 Model Lifecycle
9/30
8/6/2019 Model Lifecycle
10/30
Data Partitioning
Avoid Over-fitting of model
Validating a Model
Comparison of a Model
Sample
8/6/2019 Model Lifecycle
11/30
Data Partitioning
Divide Sample randomly into three Parts
Suggested Division
Sample
Data Type Purpose Suggested
TrainingData
Build Model 60%
ValidationData
Validate Model 30%
TestingData
Compare Model 10%
8/6/2019 Model Lifecycle
12/30
Rationale
Provides Preliminary Insights into Data
Preliminary Insights include
Causal Relationships
Correlated characteristics
Central Tendency Dispersions
& Many More Explore
8/6/2019 Model Lifecycle
13/30
Techniques
Statistical Charts
Histogram
P-P Plot/ Q-Q Plot
Box Chart
Preliminary Data Analysis Mean/Median/Mode
Symmetry/ Kurtosis
Variance
Explore
8/6/2019 Model Lifecycle
14/30
8/6/2019 Model Lifecycle
15/30
Techniques
Imputation Missing Data Analysis
Standardization
Standardize data
Normalization Log Transform Logit Transform Probit Transform
Data Reduction
Principal Component Analysis Canonical Correlation Modify
8/6/2019 Model Lifecycle
16/30
Rationale
Establishes causal relationship between independentcharacteristics and Target
Can preserve relationship in precise and concisemathematical function
Provides unique measurement scale in-form of weightedsum of characteristics, where weights are data dependent
Model may satisfy one of the Objectives Classification Prediction Forecasting Model
8/6/2019 Model Lifecycle
17/30
Techniques
For Classification Classification Trees Logistic Regression Neural Network
For Prediction Regression Trees Linear Regression Neural Network
Forecasting ARIMA Models
Smoothing Techniques Exponential Smoothing Holt Winters Smoothing Moving Average Smoothing
Model
8/6/2019 Model Lifecycle
18/30
Model Validation
8/6/2019 Model Lifecycle
19/30
Model Validation - Rationale
Check for modelAccuracy
Check for Over-fitting of Model
Check for ModelValidity across Population
Check for Predictabilityof Model
8/6/2019 Model Lifecycle
20/30
Model Validation - Process
Compute PredictedOutcome based on
establishedDecision Rule
Compare PredictedOutcome with
historical Outcome
Measure efficiencyof Model
Check for
unconsumedInformation
Measure gain overrandom model
Compute PredictedOutcome based on
establishedDecision Rule
Compare PredictedOutcome with
historical Outcome
Measure efficiencyof Model
Measure gain overrandom model
Training Data Validation Data
8/6/2019 Model Lifecycle
21/30
Model Validation - Techniques
Checking Accuracy of Model
Confusion Matrix
Mean Squared Error (MEE)
Checking Efficiency of Model
R2 and Adjusted R2
Checking for Unconsumed Information Using Error Plots
Gain over Random Model
Lift Chart
8/6/2019 Model Lifecycle
22/30
Model Validation Error Plots
8/6/2019 Model Lifecycle
23/30
Model Validation Lift Chart
8/6/2019 Model Lifecycle
24/30
Model Validation Confusion Matrix
True Positive
True NegativeFalse Positive
False Negative
8/6/2019 Model Lifecycle
25/30
Model Assessment & Deployment
8/6/2019 Model Lifecycle
26/30
Model Assessment & Deployment
Multiple Competing Models for Same problem
Needs common Metric for Comparison
Best Model is considered as Champion Model
Best Model is used for Scoring on Current Data.
Model is Deployed as
Web Service PMML Code
C /sas code/ R code
ETL Job
8/6/2019 Model Lifecycle
27/30
Test Data is Used for Model Comparison
Test Data is Scored using various Models
Following Metric is compared for all models
Lift Achieved /Net Gain
Accuracy of Models
Adjusted R2
Best Model is determined based on Above Metric
Metric for Model Comparison
8/6/2019 Model Lifecycle
28/30
Model Monitoring
8/6/2019 Model Lifecycle
29/30
Model Monitoring
Model Performance is Not Static
Model Performance is Constantly Changing
Model Performance always depends
Changing Population
Changing Characteristics
Population Changes always
8/6/2019 Model Lifecycle
30/30
Top Related