CustoVal: Estimating Customer Lifetime Value Using ......

Click here to load reader

  • date post

    21-Apr-2018
  • Category

    Documents

  • view

    214
  • download

    1

Embed Size (px)

Transcript of CustoVal: Estimating Customer Lifetime Value Using ......

  • CustoVal: Estimating Customer Lifetime Value UsingMachine Learning Techniques

    Dept. of CIS - Senior Design 2013-2014

    Charu [email protected]

    Univ. of PennsylvaniaPhiladelphia, PA

    Trisha [email protected]

    Univ. of PennsylvaniaPhiladelphia, PA

    Jarred [email protected]

    Univ. of PennsylvaniaPhiladelphia, PA

    Edward [email protected]

    Univ. of PennsylvaniaPhiladelphia, PA

    ABSTRACTThe estimation of Customer Lifetime Value (CLV) is one ofthe core pillars in strategy development and marketing. CLVis a measurement in dollars associated with the long term re-lationship between a customer and a company, revealing howmuch that customer is worth over a period of time. CLV israther predictively powerful when considering customer ac-quisition processes, as well as for selecting optimal servicelevels to provide different customer groups.

    Current methods for estimating CLV involve building asingle model using the entire population of customers as in-put. This not only loses the granularity in the data, butalso gives rise to poor targeting and strategic advertising forconsumers. This paper seeks to show the advantages of com-bining smaller, targeted models intelligently in order to buildseparate models for customers buying different product cat-egories. This enables an effective use of the data that firmshave about customers to make intelligent strategic decisions.

    Using a sample dataset, the proposed implementation ofMultitask learning ([2]), in which knowledge learned is sharedbetween related categories, yields a strong improvement inCLV forecast accuracy as compared to using a single, largemodel on product categories with lower numbers of transac-tions (less than 150). Additionally, this same method re-duces the standard deviation of error when compared to thesingle large model. Most significantly, the Multitask learningmodels tend to perform better than single models when cate-gories have sparse data to train on, traditionally considereda harder task. These results indicate that Multitask learningtechniques can lead to a better outcome than current industrystandards, and perhaps is a better alternative to the existingmethodology.

    1. INTRODUCTIONIn marketing, Customer Lifetime Value (CLV) is a predic-

    tion of the net profit attributed to the entire future relation-ship a company has with a customer. Accurate and timelycalculation of CLV is important because knowing which cus-tomers are likely to be very valuable to any given company,as well as those which will cause the firm to lose money,Advisor: Dr. Eric Eaton ([email protected]).

    can help greatly with tailoring the marketing, product offer-ings, and the purchasing experience to specific customer seg-ments in order to maximize revenues and profits [15]. CLVcalculations allow firms to build a picture of which type ofcustomers are valuable or under which circumstances theybecome valuable. This can drastically improve the value ofmoney spent on customer acquisition and marketing. Thispaper provides a method for using CLV to predict whichproduct categories are valuable, which is a particularly dif-ficult estimation in new categories with sparse data.

    This paper proposes a model for calculating CLV with-out losing the granularity in the data, unlike other mod-els. Whereas many methods used today take all of anygiven customers transactions into account when calculat-ing their CLV, the proposed model will subdivide each cus-tomers transaction by product category, followed by calcu-lating each customers CLV specific to any given productcategory. For example, rather than calculating the CLV of acustomer who has bought both televisions and cameras, theproposed model would calculate a separate CLV for that cus-tomer for each product category (the total CLV would thenbe the sum of those two CLVs). This, however, introduces aproblem: since each category will have less data to build amodel upon than the original model, some categorys mod-els will be weak and largely overfitted if their data is usedin isolation.

    To solve this problem without compromising the advan-tages of tuning models to specific categories, the model willincorporate Multitask learning. Multitask learning is a branchof machine learning consisting of several algorithms for shar-ing knowledge between different tasks. Product categorieswhich have fewer transactions (and therefore inherently weakermodels) will be bolstered by information gained from morepopular and related product categories. This method leavespopular categories models relatively unchanged, while greatlystrengthening less popular categories models. In short, theproposed model is designed to address the following problemstatement:

    Given sample transactions, predict Customer Lifetime Valueat the product category level, even if data per category issparse.

    The model will take as input a predefined transaction log,

  • and it will output a CLV for each customer in each productcategory they have made purchases in. Additionally, use-ful graphics and other visualizations are produced based onthe output to make the data more accessible and useful inderiving key insights.

    This document examines related efforts at solving thisproblem, then outlines the schema and software implementa-tion of the proposed system, and finally discusses the resultsof the system on sample data and looks towards potentialimprovements. First, it reviews the current research in cal-culating CLV so as to show how this approach is novel anduseful. Second, it describes the proposed model and its cor-responding step-by-step implementation of an algorithm topredict CLV using Multitask learning. Then, it expoundson the evaluation criteria to be used to determine the suc-cess of this project and how well the model compares tothese criteria. Finally, suggestions for future improvementsor expansions are put forward and explained.

    2. RELATED WORKSeveral models have been used over the years to quantify

    CLV. They have ranged from simplistic to complex, and theyhave incorporated ideas from diverse fields such as mathe-matics, econometrics, and computer science. The Multitasklearning model makes use of some of these models, and aimsto improve on all of them when data is sparse in some areasbut rich in others.

    2.1 RFM ModelOne of the first attempts to gauge customer value was a

    system called the RFM model, standing for Recency, Fre-quency, and Monetary Value. These models were originallyintended to determine how successful a direct marketingcampaign (e.g. postcards) would be to each customer, indi-vidually. They incorporated only three variables: the timebetween now and the customers last purchase, the averagetime between a customers purchases, and how much moneythe customer spends on any given purchase (Wei et al. [19]).The most typical application of this model would break cus-tomers up by quintiles on each of the three factors beinglooked at, yielding 125 groups of customers. Based on someformula, the costs of marketing to each cell and the predictedvalue from each cell would be calculated, and a breakevenline would be formed. The company would then deploy itsmarketing campaign to only target those cells which it con-siders could be profitable.

    This technique can be fairly readily applied so as to deter-mine CLV. The technique is simple, intuitive, and does notrequire a large amount of complicated data. Essentially, acompany wishing to determine CLV of its customers wouldassume a constant direct marketing campaign for the re-mainder of each customers lifetime, and then determine theprofitability of each customer (Cheng et al. [3]). However,this technique has a number of drawbacks. The conceptof a constant marketing campaign is impractical; the cam-paign itself would be incredibly expensive and the customerswould eventually grow immune to it. Looking only at trans-action histories without accounting for demographic factorscould lead to major oversights. Unless the formula is manu-ally tweaked to backtest better, there is no impetus for themodel to learn from past successes and failures.

    The proposed approach will improve on the basic RFMmodel in a variety of ways. Most prominently, the tech-

    nique incorporates learning, thereby increasing the strengthof the model over time. It also accounts for sequential data,rather than selecting a fixed time period and looking only atthat bucket of data. Finally, as has been shown elsewhere(Gupta et al. [8]), models which incorporate more than thesethree factors do a better job predicting CLV than RFM mod-els. Although RFM was not developed explicitly to calculateCLV, it is an important benchmark to be compared against.

    2.2 Econometric ModelsSome models exist that seek to take more covariates into

    account than probability models, which typically only userecency, frequency and monetary value of transactions inorder to estimate the different elements of CLV.

    An example of this is the use of proportional hazard mod-els to estimate how long customers will remain with the firm.The general form of these equations is

    (t,X) = 0(t) exp(X)

    where is the hazard function, which is the probabilitythat the customer will leave the firm at time t given thatthey have remained with the firm until time t, and X is aset of covariates which may depend on time. 0 is the basehazard rate; typical examples include the exponential model(in which case it is memoryless) or the Weibull model (tocapture time dependence). See Cox [4] for the original paperon proportional hazard models, and Knott et al. [11] for anexample of use.

    Once this hazard rate has been established, the survivorfunction is calculated as

    S(t) = P (T t) = exp

    t0

    (u)du

    Where T is the actual time that the customer leaves the

    firm. For