Business Intelligence Using SAS Final Presentation
description
Transcript of Business Intelligence Using SAS Final Presentation
Bank Marketing Project
Group 7:Zhaodi Liu
Preete Dixit
Nandini Naik
Rashmi Nadubeedi Ramesh
Pravin Kumar Prem Kumar
Agenda
• Project Motivation• Data Description• Our BI Models• Experimental Results• Association Rule mining• Managerial Implications• Challenges• Conclusion
Project Motivation
• Direct marketing targets customers directly with a personalized message as opposed to Mass marketing
• The primary benefit to businesses: – Increased lead generation– Increase sales volume – Increased customer base
– Minimize losses
• Focus on generating more "qualified" leads
Impact of Data Mining
• Can be very effective for direct marketing• Use of sophisticated algorithms generate
rules, determine the most useful attributes and predict future outcome
• Our goal is to predict the probability of a client subscribing to the term deposit
• In the interest:– To boost sales to existing customers– Increase customer loyalty– Recapture old customers and generate new
business
Data Description
• The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution
Data Set Characteristics:
Multivariate Number of Instances: 45211 Area: Business
Attribute Characteristics:
Real Number of Attributes: 17 Date Donated 2012-02-14
BANK CLIENT DATA
Serial No Name Description Data Type 1. age Client’s age numeric 2. job type of job categorical 3. marital marital status categorical 4. education level of education categorical 5. default has credit in default? binary 6. balance average yearly balance in euros numeric 7. housing has housing loan? binary 8. loan has personal loan? binary
DATA RELATED WITH THE LAST CONTACT OF THE CURRENT CAMPAIGN Serial No Name Description Data Type
9. contact contact communication type categorical 10. day last contact day of the month numeric 11. month last contact month of year categorical 12. duration last contact duration, in seconds numeric 13. campaign number of contacts performed
during this campaign and for this client
numeric
14. pdays Number of days that passed by after the client was last contacted
from a previous campaign
numeric
15. previous number of contacts performed before this campaign and for this
client
numeric
16. poutcome outcome of the previous marketing campaign
categorical
OUTPUT VARIABLE (DESIRED TARGET) Serial No Name Description Data Type
17. y has the client subscribed a term deposit
binary
BI Model - With Target Profile
Profit/Loss Matrix
Decision Tree
Regression
Regression Node: To convert categorical values to interval value using dummy variables concept.To group logically related categories in order to reduce the number of independent variables in regression equation.
Neural Network
The model converged after 70 iterations and contains 124 weights.
Neural Network with Input Selection
• Reducing the number of modelling inputs reduces the number of modelling weights as well as computational costs and possibly improves the model performance.
• The useful inputs are selected by connecting the Neural Network Node to the Regression Node.
The model converged after 42 iterations and contains 46 weights.
BI Model - Without Target Profile
Decision Tree
Regression
Neural Network The model converged after 70 iterations and contains 124 weights.
Neural Network with Input Selection The model converged after 76 iterations and contains 85 weights.
Model Assessment and Scoring results with Target Profile Model
– The performance of the four models are compared based on the average profit using the model comparison node.
Fit Statistics
ROC Plots
Confusion Matrix
Scoring
– Scoring is used to implement the model deemed best by the model comparison node for predicted the outcome for a new case/observation for which the outcome is unknown.
Replaced Variables:– Job – management
– Education – Secondary– Contact – Cellular
Rejected Variables:– Poutcome– Target y
Scoring
Scoring
Actual Data Scores:– Percentage No = 88.476%– Percentage Yes = 11.524%
There is a slight difference of 1.725% in the prediction model outcome and the actual outcome.
Model Assessment and Scoring results without Target Profile Model
– The performance of the four models are compared based on the misclassification rate using the model comparison node.
Fit Statistics
ROC Plots
Confusion Matrix
Scoring
Scoring
Scoring
Actual Data Scores:– Percentage No = 88.476%– Percentage Yes = 11.524%
There is a slight difference of 1.725% in the prediction model outcome and the actual outcome.
Association Rule Mining
Data Pre-processing
• Default : D(Yes), D(No)
• Housing : H(Yes) , H(No)
• Personnel Loan : PL(Yes),PL(No)
• Age : 20- 40, 40-60, 60-90, and 90-100
Results & Interpretation
Managerial Implicationsif pdays < 19.5 or MISSING
AND month IS ONE OF: MAY, JUN, JUL, AUG, NOV, JAN or MISSING AND duration < 348.5 or MISSING
AND age < 60.5 or MISSING then Predicted: y=YES = 0.02 Predicted: y=NO = 0.98 A total of 100 customers who and the cost of calling a customer is $12 then there will be a saving $1200 just by not contacting these set of customers.
if pdays < 19.5 or MISSING
AND month IS ONE OF: FEB AND housing IS ONE OF: NO
AND duration < 466.5 or MISSING AND day < 20.5 AND day >= 9.5
AND age < 60.5 or MISSING then Predicted: y=YES = 0.75 Predicted: y=NO= 0.25 A total of 100 customers and cost of calling a customer is $12 and if the profit is $100 then the Bank could generate revenue of $10,000.
Decision Tree Model – Our best-fit model for maximizing profits
Decision Tree
Predicted
Positive Negative
Actual Positive 1314 1330
Negative 863 19098
Predicted
Positive Negative
Actual Positive $15,768 $0
Negative -$10,356 $0
Regression Model• If the pdays increase by 1-unit then it has
absolutely no impact on the odds of not subscribing to the term deposit.
��.���𝟖𝟗� ≈ �
Challenges
• To implement the Profit/Loss matrix
• Absence of ROC plot in the result of Model Assessment
• Non-convergence of Neural Network
• Scoring
Conclusion
• Successfully implemented 2 predictive analysis models to predict the outcome of term deposit subscription
• Decision Tree best fit-model based on Profit/Loss• Decision Tree best-fit model based on
Misclassification Rate• Using the Decision rules
– Results in a saving of $1200 – Generates a revenue of $10,000
• Using Profit/Loss Matrix– Profit of $15,768– Savings of $10,356
Q&A