Customer Segmentation and Predictive Modeling
-
Upload
angie-wang -
Category
Documents
-
view
144 -
download
1
Transcript of Customer Segmentation and Predictive Modeling
Angie Wang
Customer Segmentation & Predictive Modeling Project
The Dataset
A rich data set with over 226,000 records, reflecting over 137,000 orders from 100,000 random U.S. customers (representative of all customers) who make purchases between 12/15/2004 and 09/17/2012. Based upon every single order line, this database records a wide range of historical sale information including customer ID number, zip code, order date, cancel date, shipping date, price, cost, channel, payment method and etc.
IDENTIFY MEANINGFUL CUSTOMER SEGMENTS01
PROVIDE MANAGERIAL IMPLICATIONS02
DEVELOP PREDICTIVE MODELING OF TOTAL PROFIT03
PURPOSES OF THE PROJECT
1 IDENTIFY MEANGINGFUL SEGMENTS
PART ONETO UNDERSTAND CUSTOMER BEHAVIORS TO HELP
BARNEYS GENERATE MORE PROFITS
ASSUMPTION IN CUSTOMER SEGMENTATIONThe RFM Model identifies meaningful customer segments
Recency Frequency Monetary
How recently a customer makes a purchase
How much a customer spends
How often a customer makes a purchase
METHODOLOGYSegment a large sample of customers into distinct groups of homogeneous customers
Aggregate transactional data to customer data
Identify critical variables related to RFM Model
(Profit, the time between the first and last orders, and number of orders)
Use SPSS Hierarchical and K-Means Cluster Analysis to identify meaningful customer
segments
1 2 3
SPSS Customer Segmentation Results100K customers are segmented into six clusters. Customers in Cluster 3 (Middle-class shoppers) and Cluster 6 (Upper-class shoppers) are identified as the most valuable customers to Barneys based on Frequency and Monetary in the RFM model.
Payment Method1. Amex2. Discover3. MasterCard4. Visa
Channel1. Phone2. In- Store3. Website
Highlights of the SPSS Customer Segmentation Results
Cluster 1; 5128492.86000009;
50%
Cluster 2; 1094995.41; 11%
Cluster 3; 1118466.09; 11%
Cluster 4; 1884696.28; 18%
Cluster 5; 791138.060000002;
8% Cluster 6; 176439.06; 2%
TOTAL PROFIT DISTRIBUTION
CLUSTER 1; 65.0890047212926;
1%
CLUSTER 2; 217.433560365369;
4%
CLUSTER 3; 621.02503609106;
10%
CLUSTER 4; 143.617791663492;
2%
CLUSTER 5; 716.610561594204;
12%CLUSTER 6; 4303.39170731707;
71%
AVERAGE PROFIT OF EACH CUSTOMER
Highlights of the SPSS Customer Segmentation Results
CLUSTER 1
CLUSTER 2
CLUSTER 3
CLUSTER 4
CLUSTER 5
CLUSTER 6
1
2
3
Most Popular Shopping ChannelCLUSTER 1
CLUSTER 2
CLUSTER 3
CLUSTER 4
CLUSTER 5
CLUSTER 6
0
2
4
Most Popular Payment Method
1. Amex2. Discover3. MasterCard4. Visa
1. Phone2. In- Store3. Website
Highlights of the SPSS Customer Segmentation Results
CLUSTER 1
CLUSTER 2
CLUSTER 3
CLUSTER 4
CLUSTER 5
CLUSTER 6
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0
0.9
3.7
3.0
1.7
3.2
ANNUAL PURCHASE FREQUENCY
2 MANAGERIAL IMPLICATIONS PART TWO
The Most Valuable Customer Segments: Cluster 3 & 6
Managerial Implications for Cluster 3
Characteristics of an average customer
• ≈ 4 purchases per year• $621 in average profit• Visa Payment• In-Store shoppers• 8 months in service
Recommendations
• Referral Program• Cash-Back rewards• Customer Knowledge -
feedback
Managerial Implications for Cluster 6
Characteristics of an average customer
• ≈ 3 purchases per year• $4303 in average profit • American Express• In-Store shoppers• 45 months in service
Recommendations
• Increase Customer Lifetime Value
- VIP in-store services , birthday gifts• Increase Customer Influencer
Value - Customer satisfaction , word-of-mouth • Partner with American Express
3PREDICTIVE MODELING OF TOTAL PROFIT
PART THREEThe two selected customer segments with little similarity:
Cluster 1 & 2
METHODOLOGYDevelop Predictive Modeling to Forecast Total Profit for the Two Selected Customer Segments
Select variables that are relevant to total profit
Run Multiple Linear Regression in SPSS with Calibration sample (60% of random
sample from a cluster) and then validate the predictive modeling with Validation sample
(40%), and identify outliers.
21
ASSUMPTIONS IN PREDICTIVE MODELING
TOTAL PROFIT
NUMBER OF ORDERS
NUMBER OF ITEMS
ONLINEPURCHASE
RETURN QUANTITY
VISA PAYMENT
Significance Criteria
If the significance level of a
variable is less than 0.05 in
Coefficients Table, that variable
will have impact on Total Profit. .
Multicollinearity Criteria
If toleration is greater than 0.1
or 0.25 and VIF is less than 10
or 4 in Coefficients Table, there
is no multicollinearity effect.
. .
PREDICTIVE MODELING OF TOTAL PROFIT – CLUSTER 1
TotalProfit = 106.424 – 76.543*OrderNumber + 6.079*ReturnQuantity + 23.202*Quantity– 1.270*WEB - 2.075*VISA
REGRESSION RESULTS FROM SPSS (Note: Web and Visa are dummy coded, 1 or 0)
24.9% of the variation in total profit in cluster 1 can be estimated by the selected five variables (Number of orders, number of items, return quantity, online purchase, and visa payment). The remaining 75.1% is unexplained by this model, due to other variables.
0102
03• Negatively related to total profit
• The mode of OrderNumber
one-time shopper
• Only 116 customers out of the
total 78,792 customers in
Cluster 1 shop at Barneys twice
NUMBER OF ORDERS (-)
VISA PAYMENT (-)
ONLINE PURCHASE (-)
CONSUMER INSIGHTS – CLUSTER 1
• A customer who places orders
online creates $1.27 less in total
profit than through other
channels (in-store and by phone)
• Only customers in Cluster 1
prefer online shopping.
• No human interaction in-store or
over the phone less profit
• A customer who places orders
by Visa creates $2.075 less in
total profit than by other
payment methods.
• Fees are charged by Visa
provider.
TotalProfit = 106.424 – 76.543*OrderNumber + 6.079*ReturnQuantity + 23.202*Quantity– 1.270*WEB - 2.075*VISA
PREDICTIVE MODELING OF TOTAL PROFIT – CLUSTER 2REGRESSION RESULTS FROM SPSS
The two variables, WEB and VISA, are deleted from the regression model stepwise because the significance levels are greater than 0.05, indicating no significant relationships with total profit.
NEW PREDICTIVE MODEL OF TOTAL PROFIT – CLUSTER 2REGRESSION RESULTS FROM SPSS
40.9% of variation in total profit in cluster 2 can be estimated by the THREE independent variables (Number of orders, number of items and return quantity). The remaining 59.1% is unexplained by this model, due to other variables.
TotalProfit = 42.822+14.859*OrderNumber+23.193*Quantity+8.210*ReturnQuantity
0102 (SAME CONSUMER INSIGHTS IN CLUSTER 1)03 (SAME IN CLUSTER
1)• Positively related to total profit.
• An average customer in Cluster 2
shops at Barneys for 3 times over
45 months in service.
greater than one time in Cluster 1
• For each additional order, total
profit increases by $14.859.
NUMBER OF ORDERS (+)
RETURN QUANTITY (+)
NUMBER OF ITEMS (+)
CONSUMER INSIGHTS – CLUSTER 2
• Positively related to total profit.
• The more items a customer
purchases
higher total profit
• For every additional item that a
customer purchases, there is an
increase of $23.193 in total
profit.
• Positively related to total profit.
• The free return policy and 100%
money back guarantee
Customers buy more and return more
Higher total profit
TotalProfit = 42.822+14.859*OrderNumber+23.193*Quantity+8.210*ReturnQuantity
THANK YOUAngie Wang