An Animated Guide: Incremental Response Modeling in ...An Animated Guide: Incremental Response...

An Animated Guide: Incremental Response Modeling in Enterprise Miner by Russ Lavery, Bryn Mawr PA

ABSTRACT:

John Wanamaker is famous as a father of modern marketing. He is well-known for saying, “half of the money I spend on marketing is wasted; the trouble is I don’t know which half.” Net lift modeling is a technique that helps marketers to not spend money on marketing to people who would have bought the product without any marketing contact. The idea is: marketing efforts, to people who would have bought the product anyway, represent wasted effort and money.

SAS Enterprise Miner has a module that makes net lift modeling both simple and fast. This paper explores some of the background and the SAS interface associated with net lift modeling.

INTRODUCTION:

Figure 1 explains the basic idea of net lift modeling. It explains that after some marketing event, which cost a company money, some of the buyers were persuaded by the marketing effort and some of the buyers would have bought anyway. Additionally; this highlights the idea that some people did not buy because of the marketing contact.

Figure 1

Net lift modeling has the goal, and has a technique, to identify the people in group B. The idea put forth by advocates of net lift modeling is that a business should not spend money, or time, contacting people that would have bought the product anyway. This is a very basic idea and has been known for years. There is an English idiom, ”preaching to the converted”, that describes this sort of misplaced marketing effort.

This idea is not new and has been implemented, in many disciplines, where it has been possible to assign people to the four groups. A quick web search will turn up many references to politicians who are targeting undecided voters. Politicians concentrate their efforts on geographies where, with effort, they can improve their chances of winning an election. Politicians ignore geographies where their party is so well represented that, without some major mistake, the politician would win that geography. They also ignore geographies where their party is so poorly represented that, without some major mistake by the opposition, the politician would lose that geography. The basic benefits associated with net lift modeling have been known for years. What is new is the math and the software.

OVERVIEW QUESTIONS: Some questions that are rarely addressed by people discussing net lift modeling are: 1) How common, or are important, are these groups to a particular company? 2) How contextual are the effects/groups?

This paper will digress into a bit of discussion of the four groups and how they might come to exist.

Group A: brought into existence by the marketing effort

Most businessmen would agree that this group exists and is common in all industry segments. It is hard to doubt the existence of an advertising effect, even if measuring the effect size is difficult. Group A is brought into existence by good targeting and good messaging. However, what is presented in net lift modeling considers marketing efforts in terms of a step function – on or off. The author wonders if there is an important advertising dose – response effect that should be considered in the net lift modeling literature.

Group B: would have bought the product anyway

If this group exist, for a particular company, one should ask how it came to exist. Without any research to back up this statement, it would seem that this group is more likely to exist in certain situations. The customers must be information seekers because they must know that the product exists and to value its worth in comparison to other products on the market. Information seeking behavior is more likely when people are involved in the product class than when they are not. Possibly, low information clutter in the marketing channel would help this group come into existence because “low clutter’ reduces the amount of information that the customer must process. One might hypothesize that examples of people in this class (people who seek info) might be:

innovative physicians looking to keep on the cutting edge of their field teenagers with their involvement in video games a homeowner buying electric or natural gas service

Group C: did not buy despite the marketing effort

It is possible that the environmental conditions that would bring this group into being are many and varied. It might be that such customers have already had a bad experience with the product and will not repeat-buy because their personal experience outweighs any marketing effort. It could be that the targeting was poor and the models used to identify potential buyers were missing important variables. It might also be that there was a channel of distribution problem, and the customer did not buy because he was unable to find the product – though the Internet should reduce the size of this group.

Group D: did not buy because of the marketing effort

The environmental conditions that would bring this group into being are many, varied and the subject of much research. A Google search quickly brings up many articles researching the effect of negative ads in political campaigns. In a short review of web hits, it seems that the results are very mixed. While results are generally inconsistent across studies, this paper would like to mention a few results – and warn that they might not be reproducible in any particular setting.

Studies have found that the effects of that rude, negative political ads differ by gender. A study found women to be less tolerant, than men, of the poor manners involved in delivering a negative ad.

Another study found that the effect of a negative political ad varied with the underlying truth of the statement being presented in a rude and negative manner. If the statement was false, or questionable, there was a bounce-back effect that harmed the politician paying for the ad. If the negative statement was seen to be true, it had a positive effect for the politician paying for the ad.

The author had experience with this bounce-back effect when volunteering for a political campaign against an incumbent. The author was making out-reach phone calls to undecided voters. Electioneering consultants had recently placed an ad on TV that criticized the opposing candidate for having supported a bill budgeting $70,000 to investigate the mating habits of geese. A woman answered and said she would never vote for my candidate because of that ad. She felt the ad was deceptive. Her view was that the $70,000 was a typical political payback, and a necessary payback, because the sponsor of the “goose bill” had supported the incumbent on some previous issue. She said that my candidate, if elected, would do the same thing or be ineffective – and hung up the phone.

As we consider how this group came into being (people who did not buy because of the marketing) some questions occur.

Is this effect caused by very poor marketing or must the message be offensive in some way?

Is this only a problem in low-attention products – versus high-attention products? A boring ad might reduce the probability of a person buying a particular kind of soda if the person does not really care about sodas and so the choice is unimportant. However; a boring ad might not reduce the probability of buying a desired car – unless the primary motivation for selecting a car were status/image and the boring ad somehow damaged the car status/image.

It could be possible that this problem is more prevalent when the customer has no experience with the product. If a person has bought three different laptops from the same company, and the company comes out with a silly/ineffective marketing campaign, the person might still be likely to buy that brand of laptop.

I wonder if the size of this group will vary depending on the mix of customer motivation. If a customer is buying a product for utilitarian, and functional, reasons a marketing mistake might be expected to have a small effect. If a person is buying a product for status reasons, a marketing mistake, because it lowers the status of the product, might increase the size of this group.

I suggest that thinking in terms of commonly used marketing paradigms might complicate the implementation of net lift modeling. Many people think customers pass through stages that are represented by the AIDA model seen in figure 2. AIDA believers might think that the effect of the marketing effort is time contingent.

Figure 2

It might be that the effect size of a marketing effort (a coupon offer?) depends on where the customer is in the AIDA process. Imagine the offer to be made to people spending time on a company’s website. Effect size might vary with the amount of time that the customer has spent on the website. It might also be that the mixture of groups, the percentage of people in A, B, C and D groups, can be might vary with the amount of time spent on the website.

It is possible that people who have spent 20 minutes on a website had some strong desire for the product for which they are searching. Possibly, people who have spent a lot of time on a website are people who would buy without any marketing manipulation.

Finally, in our analysis of the groups we see in net lift model presentation, it might be worthwhile to take a step back from implementing net lift modeling and consider long term effects. Imagine the effect of separating the buyers into two groups and only marketing to one of the groups.

Figure 3

Figure 3 hypothesizes some consumer behavior. The gold bars represent people who are likely to buy after getting a marketing offer (an offer delivered via normal targeting) and are a mix of “bought because of marketing” and “self-motivated buyers”. We will assume that the blue lines represent people who would have bought the product without any marketing - assume these people are loyal customers and they would no longer get marketing contact-offers. They are a subset of the people who bought after receiving the current marketing effort. The red line shows a hypothetical effect of not giving your loyal customers the best deal. It shows hypothetical annoyed good customers.

I remember my aunt, a long-time subscriber to some magazine, seeing a new-subscriber-coupon in the magazine that was priced lower than her existing-customer-renewal price. My aunt was very angry that she, a loyal subscriber, was being treated worse than a brand-new customer. She thought that current customers should be treated as well as new customers. She called up the magazine and angrily complained. She got her price reduced and told this story to many people.

I am sure the short-term effect of that great introductory subscription offer was positive for that magazine. Not fixing the Pinto gas tank was also positive for many years. However; I wonder if any effort that treats new customer better than old customers is a good long-term strategy. It might be that treating new customers better than loyal customers breaks down trust and has a negative long term effect.

Figure five shows a different way of thinking about customers. This graphic splits customers on whether they received a marketing effort or not. It hypothesizes the existence of a group that would have bought if the marketing effort had been of a higher quality.

Figure 4

I know of no math that would allow us to separate these groups, but understanding that these groups might exist might change a company’s level of buy-in to the net lift modeling process and how the results of an analysis are interpreted. Maybe, the proper X variable should not be “marketing yes/no” but “marketing quality”.

THE POLITICAL PROCESS:

Net lift modeling requires a control group and creating a control group can be a difficult political sell. In most clients, where I have worked, the people who created a marketing effort had great faith in its effectiveness. They always believed that it should be rolled out, immediately, to every potential customer. There was an especially strong belief that every “high quality” customer should get the marketing effort because the company needed all possible sales.

VARIABLE SELECTION: OLD METHODS ARE INEFFECTIVE

The man who created and teaches the SAS course on net lift modeling, Larson, says that effectiveness of net lift models are very sensitive to the variables used in the model. He says that standard variable selection methods (the ones used in the common stepwise techniques) do not select variables that create high performing net lift models.

Larsen teaches his course using a modification of weight of evidence (WOE or WoE) and information value (IV) that he calls “Net WOE” and “Net IV”. WOE and IV are based on the principle of odds ratio analysis and they are measures of the ability of a grouped X variable to separate good and bad risks. WOE and IV are used in other modeling techniques besides, net lift modeling, and are worth reviewing. SAS Enterprise Miner implements this variable selection method in a very convenient manner.

L

LINKING WOE TO LOGISTIC REGRESSION

WOE and IV are often used in modeling processes that involve logistic regression. Figure 5 will allow us to do a bit of review of logistic regression.

The top chart is a plot of probability versus X. This data (imaginary data) was collected through the process of putting 100 bugs in each of 10 test tubes and dosing the test tubes with increasing levels of pesticide. This is not dissimilar from sorting rows by a continuous X variable and then dividing the data set into 10 groups, or buckets based on the sorted values of X.

Figure 5

The top chart has the typical S shape response of an organism to a stimuli. At X equals seven (7 ppm of pesticide) 90% (90 out of 100) of the bugs died.

Let’s see how these charts are related. Look at the point in the top chart that is marked with the red star. It is at X at equals seven and the probability equals 90%. If we take the odds of a bug, dying at that point, we divide .9 by .1 and get the number nine as the odds of dying relative to living. That number is what we plot on the center chart. This process can be repeated for all the numbers in the top chart. The center chart is a plot of odds of the event versus X.

Next we see that if we take the log of the odds we can move from the middle chart to the bottom chart.

While we have only shown two points on the bottom chart, I will assert that if one starts with an S-shaped probability curve, and takes the log of the odds, one ends up with a straight line vs. X. We see such a straight-line in the bottom chart.

This straight line relationship between the logit of Y (logit=log of the odds) and X can be modeled with simple regression. This transform to linearity is why the logit transform is so popular.

Figure 6 emphasizes the similarity between the formulas for a regular logistic regression (beta values from SAS PROC Logistic) and a WOE model. There are some insights to be found here. There is a strong similarity between the formulas for a regular logistic regression and one where WOE is the independent variable.

Figure 6

We will see examples of calculating weight of evidence in later figures but for now this paper will spend some more time discussing figure 6. There is an equation, in black, on the right-hand side of figure 6 that deserves some attention. It has one term on the left-hand side of the equals sign and two terms on the right.

The term that is on the left-hand side of the equals sign is simply the log of the odds. It is the log of a probability ratio. When the formula says “bucket equals I”, we can think of that as the X variable where the X horizontal scale has been divided into 10 units or buckets. The term on the left-hand side is what we would plot in the bottom chart in figure 5. With this knowledge we might think that term, on the right-hand side of figure 6, is in some way equivalent to a logistic regression.

The first term to the right of the = is just the “log of the odds of being a one or zero” (or the log of the odds of “by vs not by” or “success vs fail”). This term does not vary with the levels of X (it does have any x conditionality and does not vary with buckets). This term is the intercept in a regression model. It measures, in terms of our example, how the analyst sampled zeros and ones. If the number of zero equals the number of 1s this term is ln (1) or 0.

The second term to the right of the equal sign is “weight of evidence” (AKA the log density ratio). This would be the equivalent to the slope of the line in a regression.

Hopefully, figure 5 helps explain why many new techniques, that are built on logistic regression logic, incorporate WOE and IV as X variables and, as we will soon see, as techniques for selecting X variables for inclusion in a model.

CALCULATING WEIGHT OF EVIDENCE (WOE) AND INFORMATION VALUE (IV):

It is difficult to explain WOE and IV in words and it is even difficult to explain some of the interim calculations in words. This paper will present examples of worked calculations that will allow the reader to check her/his understanding.

In figure 7 we see a definition of two quirky terms: “percent distribution goods (for a bin)” and “percent distribution bads (for a bin)” As you can see, in column A we have sorted the X variable, age, from low to high and divided the data set into bins. It is not required that bins have equal “widths”. There are several logics for binning and SAS, in Credit Score Modeling will use O.R. to create an optimal binning strategy.

Figure 7

In column B we have the total count of observations in that bin. Column B should be the sum of columns D and column F.

In column C we have that row’s percentage of the total number of observations in the data set– not a very useful number in this example.

In column D we have the count of the number of “goods” in a bin. We will model “bads” or “events”, not goods.

In column E we have the “percent distribution goods for a bin.” It is the number of goods in a particular bin divided by the total number of goods in the data set.

In column F we have the number of “bads” in this bin.

In column G we have the “percent of distribution bads (for a bin)”. It is the number of bads in that bin divided by the total number of bads in the data set.

Column H is important for understanding WOE and IV but not useful in their calculations. Column H lets us understand, without any fancy math, whether a variable will help the model predict good/bad or not. Column H shows what percent of the observations in a bin are bad.

In figure 7 we see that every bin has the same percentage bad, or very close to the same percentage bad. This means that, as we change X, the likelihood of finding more, or fewer, bads is a constant.

In figure 7, as age changes the likelihood of a bad (or “an event”) does not change. This means that age does not help us predict Y (the good/bad percentage). On this slide we have defined two terms. We have also shown how column H can help us interpret whether an X variable should stay in the model, or be removed, because it lacks predicting power.

COMPARE AND CONTRAST TO UNDERSTAND WOE AND IV

To help understand WOE and IV we will look at three calculations and do a “compare and contrast”. The first example will show a continuous X variable that has no predicting power. The second example will show a continuous X variable that has predicting power. The third example will show a categorical variable that has predicting power.

Since column H shows us whether the variable is predicting, or not, using logic we’ve known for many years, column H will be a bridge between our current understanding and an understanding of WOE and IV.

Much of figure 8 is also in figure 7. Looking in column H and notice that the percentage of bads across bins is very constant. Look at the plot in the lower left-hand corner. The percentage of bads in a bin does not change as X changes. We understand, from looking at column H, that this variable will not help us predict percent bad.

Figure 8

Now that we know that this variable is a weak predictor let’s look at column I – WOE. All of the WOE values are close to zero. We see that the WOE values for different bins are very similar. These are characteristics of an X variable – or of an x variable binning scheme – that does not predict Y.

It will not help in this case, but one can adjust the bins in Enterprise Miner. The Credit Score Modeling tab (extra fee, naturally), in Enterprise Miner, has a special node that allows an analyst to adjust the boundaries between bins to create variables with WOEs that help the X variable predict Y. This “optimal binning” node does not seem to be used in any of the SAS manuals, lectures or papers, about net lift modeling. The net lift modeling node creates its own bins for X variables and analysts seem to have little choice in the strategy.

Column J is where we calculate the IV (information value). We calculate an IV value for each of the bins and then sum each of the IV values into one total. We pay attention to the sum of all the IV values (the total) as we try to determine if an X variable is predictive. When the sum of the IV values is small the variable does not help us predict and can be removed from the modeling process.

That is an explanation for calculating weight of evidence and information value. We will “compare and contrast” two more examples.

Figure 9 shows an X value that does help us predict. All the columns are created with the same logic as the previous figure. Column H lets us judge the worth of a variable/grouping. The percentage of bads, between bins, changes greatly. As X changes the probability of being bad changes. This variable deserves to stay in the modeling process. The chart, in the lower left corner of figure 9, also shows that the percentage of Y changes as age changes. Figure 9

Column I shows the WOE for each of the bins and a reader can see that the weight of evidence changes from bin to bin. Many people like a monotonic change in WOE, relative to X bins, and combine bins to make that relationship.

Column J shows the individual IVs for each of the bins and the total is .78. We compare the .78 to the rules for IV and find out that this is suspiciously high as a predictor. This “suspiciously high as a predictor” is not a new phenomenon. Researchers in the social disciplines have, for years, had the opinion that a model of human behavior with a very high R squared likely contained a hidden flaw. While there are rules for evaluating IV, there is no rule, for WOE.

In figure 10, please note that column A shows we are using a new X variable-University class. Column H shows us that the percentage of events changes between bins. Column H, and the chart in the lower left-hand corner, suggest that this variable should remain in the modeling process. When we look at the chart, in the lower left-hand corner, we see that certain bins seem to have a high percentage of events and certain bins seem to have a low percentage of events. Figure 10

In column I, the WOE shows an up-down pattern that is similar to what we saw in the chart. In column J, we see that this variable is predictive (IV=.102) and can stay in the modeling process. This pattern suggests that years in school can be collapsed from six levels down to two. Combining bins with similar WOE, and feeding WOE values into a model can reduce the need for dummy variable coding.

NET IV AND NET WOE

Figure 11 shows a new graphic. Importantly, it shows that GROSS WOE is calculated from the group of people who were contacted. “Self-Selected WOE” is the WOE calculated from the group of people who were not contacted. We want to focus on variables where WOE varies between the contacted and non-contacted groups. This helps us identify any differential effects.

Figure 11

Figure 12 illustrates the calculations for Net Weight of Evidence (NWOE) and Net Information Value (NIV). Formulas for the calculations are shown in the column headings. NOTE: Numbers in this slide only illustrate the NWOE and NIV calculations and do not “match” to other examples in the paper. NIV measures the differential effect of variables- split between the contacted and non-contacted groups.

Figure 12

Figure 13 shows an additional complication when doing Net Lift Modeling. Randomly assigning observations to train and validate does not assure that the groups are identical. Small differences in groups can make a variable useless in Net Lift modeling. The “fix” is to calculate “Adjusted NIV”.

Figure 13

This figure illustrates that the NIV between the training and validate data sets should be compared. When variables perform differently in the training and validate data sets, an analyst is not sure which data set result is giving the more accurate indication of the ability to generalize to new data. One result is wrong, but an analyst cannot tell which one is wrong. The best practice is to not use variables in net lift models if the variables behave differently between training and validate.

Accordingly, the final step is to calculate a penalty that measures how differently the variables perform between train and validate and to only use variables that have consistent performance. A modeler should look for variables with a large NIV minus penalty.

EXAMPLE

For the rest of the paper we will explore an example of net lift modeling. Since we plan to use SAS Enterprise Miner to do net lift modeling we will concentrate on understanding the choices that SAS presents to us. While it is useful to understand the generalities of net lift modeling theory, it is crucial to understand the choices that SAS presents to us.

Figure 14 shows a small flowchart that we will use in the rest of this paper we will not discuss all of the nodes in this paper.

Figure 14

OPTIONS IN THE DMRETAIL NODE:

There are many options that you could set in this node but the most important options involve setting characteristics of variables.

A variable can be set to a type of “cost”. This allows Enterprise Miner optimize to minimize cost or to maximize profit.

A variable can be set to a type of “frequency”. Values of a frequency variable will tell SAS when rows represent multiple observations. This situation happens when the input data set has been summarized to some level. The frequency type allows non-integer values and so can be used for weighting.

A variable can be set to a type of “ID”. An ID can be used to identify groups of observations. I think of this as a “subject ID”.

A variable can be set to a type of “input”. A variable of type input will be used as an X variable. These are independent or explanatory variables.

A variable can be set to a type of “rejected”. A variable of type rejected is excluded from the analysis in the process flow. You can, manually, set a variable to be a type rejected or, when Enterprise Miner does automatic variable selection, a variable can be set to rejected by Enterprise Miner itself.

At least one variable must be set as type of “target” (the Y or dependent variable).

A variable can be set to a type of “treatment”. A variable of type treatment is used by the net lift model to identify subjects that have received the marketing manipulation, or not.

In this first node you will also assign variables a measurement level. Optional levels are: binary, interval, nominal, ordinal or unary.

StatExplore Node:

The stat explore node does what its name suggests. It allows you to explore statistics on the data set. This is especially useful for finding missing values in your data and for finding variables that are highly skewed. Often we will take steps to correct problems caused by missing data and skewed distributions.

Data Partition Node:

The data partition node allows you to split your input data into two, or three, parts. The parts are typically called training, validation and test. While the data partitions do not show up as nodes in the flowchart, Enterprise Miner knows that the data has been partitioned and will perform different analyses if the data is partitioned.

As an example, if you have a partitioned data set and ask for techniques that produce an ROC curve, Enterprise Miner will produce multiple ROC curves – one for each data set partition. In this project we will split the data set into two parts: training and validation.

StatExplore (2) Node:

I included this node in the flow chart to illustrate the effect of data partitioning and how Enterprise Miner keeps track of whether a data set has been partitioned or not. If you run this stat explore node, Enterprise Miner will produce two reports: one report for the training data set and one report for the validation data set. This is a relatively trivial example of how helpful Enterprise Miner is and how it does things for you automatically.

More complicated examples would involve things like creating multiple ROC curves, creating multiple confusion matrices, and building multiple models. The fact that you can partition a data set, and that SAS Enterprise Miner automatically recognizes the partitioning, saves you great amounts of time in doing the work needed to demonstrate the external validity of your model.

Incremental response Node:

Figure 15 shows the settings you can change to control the internal processes of the incremental response node. GENERAL SECTION: IMPORTED DATA: Clicking on the ellipses next to imported data allows you to see the types of data available to the incremental response node. In this flowchart we have two tables: train and validate. EXPORTED DATA: Clicking on the ellipses next to imported data allows you to see the types of data available to the incremental response node. In this flowchart we have two tables: train and validate. TRAIN SECTION: VARIABLES: Clicking on the ellipses next to variables allows you to see variable properties and roles. This is very similar to the functionality you had in the DMRetail node. TREATMENT LEVEL SELECTION: It is never obvious to SAS which level of the Y variable is of interest to a modeler. Sometimes people want to model the “zero” level of the Y variable and sometimes people want to model the “one” level of the Y variable. You can use this option to change from modeling the lower sorting level of Y to the higher sorting level of Y (sorted ascending or descending).

s

Figure 15

PRESCREEN VARIABLES: if you set this to “yes” SAS will use a small sub-set of data to build models and try and remove variables that are obviously not predicting. This can be very helpful when your data set contains many, many variables and running the model on all of those variables would take an inconveniently long time. This is just another example of how SAS Enterprise Miner automates tedious tasks to speed up your process and reduce run times.

PRESCREENING CRITERIA: if you want Enterprise Miner to prescreen your variables you can use this option to change the criteria that Enterprise Miner uses for variable removal. This is another example of how Enterprise Miner is both smart and helpful.

If you have only one data set, an un-partitioned data set, coming into the incremental response node SAS will prescreen the variables based on NIV.

If you have partitioned the data, so you have training and either validation or test data sets coming into the incremental response node, Enterprise Miner will automatically prescreen variables based on the adjusted NIV.

RANK PERCENTAGE CUTOFF: if you have requested that Enterprise Miner prescreen variables this will provide a rough cut off setting for removing variables from the model process. If you have several hundred variables coming into the incremental response node it would likely not harm you very much to throw out the bottom 50% - based on some prescreening.

Response Model Selection sub-section: AND Outcome Model Selection sub-section: Under Train Section

The net lift modeling node allows you to specify two types of Y variables. It allows you to specify a binary variable (“buy” vs “not-buy”) and also a sales dollar variable (interval) as a Y variable.

The incremental response node has two sections, largely duplicates of each other, that let you specify how the model should be built for Y variables.

COMBINED MODEL: Enterprise Miner gives you a choice on how you would like to create your binary models. The variable with type set as “treatment” flags observations in the data set as either having had the marketing treatment or not (test or control).

If you select “yes” Enterprise Miner will create a dummy variable identifying an observation as test or control and build one model. If you select no, Enterprise Miner will use the treatment variable to separate the two kinds of observations and build two different models.

Combined model is only an option in the response model selection subsection and only affects the response model (binary Y) modeling process.

SELECTION METHOD: This allows you to select forward, backward or stepwise selection methods.

SELECTION CRITERIA: This allows you to select the model using one of several different statistics. Choices are: AIC, SBC, VALIDATION that SAS Enterprise Miner produce additional analysis. DATOIN Error and Cross validation Error.

SIGNIFIGANCE LEVEL FOR ENTRY: This is a minimum level of significance for variable entry into the model and is used by some stepwise techniques.

STAY SIGNIFIGANCE LEVEL ENTRY: Some model building algorithms will both add, and remove, variables from the model. This is a minimum level of significance for a variable to remain in the model and is used by some stepwise techniques.

SUPPRESS INTERCEPT: This option will force the fitted line through the zero point – removing the intercept.

TWO-WAY INTERACTIONS: This option does more than its name implies. If you select “yes”, Enterprise Miner will include all two-way interaction terms for class variables and all of the second order polynomial terms for interval variables that have a status of “use”. Poor X variable performance can cause it to have it’s status set to rejected.

Revenue Calculation sub-section under Train Section:

As was said before, Enterprise Miner automates many tedious jobs and these settings are another example of a time-saving feature and Enterprise Miner.

Charts will automatically be produced showing counts by various groups but also showing revenue costs and profits. Selecting one of these options requires selecting/setting two options. If you look back at figure 15 a reader can see that it has requested that Enterprise Miner not use constant revenue and therefore not produce revenue charts. However; that figure also shows that the analyst has provided to Enterprise Miner an estimate of the revenue per response. The 10 in the revenue per response option will be ignored because the use “constant revenue” option has been set to “no”. In a similar manner, to have Enterprise Miner do cost calculations one must set “use constant cost” to “yes” and provide a cost value to be used.

REPORT SECTION:

In the report section an analyst can specify the number of bins to be used when Enterprise Miner creates for reporting histograms.

Reporting From the Net Lift Modeling Node:

Figure 16 shows some of the reports produced by the incremental response node. The response rates are similar for the train and validate data sets. This gives hope that a comparison between train and validate results will allow an analyst to make judgments about external validity of the model.

Figure 16

However the predicted increments and observed increments, differ greatly between the test and the validate data sets. This difference is disquieting and hints at what we will be forced to conclude at the end – the model, while an example of the process required to produce a net lift model, is not performing terribly well.

Figure 17 shows that the test and validate data sets are similar on both average incremental revenue and incremental revenue.

Figure 17

Figure 18 shows net weight of evidence for the 20 split values for two different variables.

Figure 18

Figure 19 shows some of the statistical output for this model. As can be seen to the right the overall model is very significant. However the adjusted R squared is only .04 (not shown) and that is not very encouraging. Because we selected stepwise as a model building procedure we get a summary of the stepwise selection and we also get an analysis of the coefficients for each of the variables.

Figure 19

Figure 20 shows that types of tables, and names of the tables, exported by the Incremental response node. If follow-up work is to be done, an analyst must know the contents of the tables. Generally, files contain the variables used in creating the model plus some others.

a

Figure 20

Some of the added variables are listed, and defined, below.

From : https://communities.sas.com/t5/tkb/articleprintpage/tkb-id/library/article-id/775

Variable Description EM_P_CONTROL_RESPONSE Predicted response probability from the control group EM_P_CONTROL_NONRESPONSE 1- EM_P_CONTROL_RESPONSE EM_P_ADJ_INCREMENT_RESPONSE Adjusted to be positive incremental predicted response

rate EM_P_ADJ_INCREMENT_NONRESPONSE 1 - EM_P_ADJ_INCREMENT_RESPONSE EM_P_ABS_INCREMENT_RESPONSE Absolute value of the incremental predicted response

rate (available when an outcome model used) EM_P_ABS_INCREMENT_NONRESPONSE 1 - EM_P_ABS_INCREMENT_RESPONSE EM_P_TREATMENT_RESPONSE Predicted response probability from the treatment group EM_P_TREATMENT_NONRESPONSE 1 - EM_P_TREATMENT_RESPONSE EM_P_INCREMENT_RESPONSE EM_P_TREATMENT_RESPONSE -

EM_P_CONTROL_RESPONSE EM_P_INCREMENT_NONRESPONSE EM_P_TREATMENT_NONRESPONSE -

EM_P_CONTROL_NONRESPONSE EM_P_CONTROL_OUTCOME Predicted value of the outcome variable from the

control group EM_P_TREATMENT_OUTCOME Predicted value of the outcome variable from the

treatment group EM_P_INCREMENT_OUTCOME EM_P_TREATMENT_OUTCOME -

EM_P_CONTROL_OUTCOME EM_REV_TREATMENT Estimated revenue for the treatment group

EM_P_TREATMENT_RESPONSE * EM_P_TREATMENT_OUTCOME – Cost or if Constant Revenue is set

EM_P_CONTROL_RESPONSE * Revenue_Per_Response – Cost

EM_REV_CONTROL Estimated revenue for the control group EM_P_CONTROL_RESPONSE * EM_P_CONTROL_OUTCOME

EM_REV_INCREMENT Estimated incremental revenue EM_REV_TREATMENT - EM_REV_CONTROL

SUMMARY: The incremental response node in SAS Enterprise Miner is a great timesaver and a great aid for people who would like to build incremental response models without having to know all the details and theory to be allow them to code a net lift model using base SAS and stat. If you combine the power of this node with the utilities built into Enterprise Miner (model comparison node for example) the process of net lift modeling will be faster and more convenient.

ACKNOWLEGMENTS: Thanks to all the great people at SAS Tech Support.

REFERENCES: https://communities.sas.com/t5/tkb/articleprintpage/tkb-id/library/article-id/775 Larsen, Kim, Net Lift Models, http://www.sas.com/events/aconf/2010/pres/larsen.pdf Larsen, Net Models, https://www.youtube.com/watch?v=JN3WE8IZNVY Zhang, Ruiwen, Incremental Response Modeling in SAS Enterprise Miner https://www.youtube.com/watch?v=zabWaSS_BDI

CONTACT INFORMATION: Your comments and questions are valued and encouraged. Contact the author at: Russ Lavery, Contractor Bryn Mawr, PA [email protected]

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

https://communities.sas.com/t5/tkb/articleprintpage/tkb-id/library/article-id/775

http://www.sas.com/events/aconf/2010/pres/larsen.pdf

https://www.youtube.com/watch?v=JN3WE8IZNVY

https://www.youtube.com/watch?v=zabWaSS_BDI

An Animated Guide: Incremental Response Modeling in ...An Animated Guide: Incremental Response...

Documents

Transcript of An Animated Guide: Incremental Response Modeling in ...An Animated Guide: Incremental Response...