Advanced Regression in Excel S

53

description

Advanced Regression in Excel S

Transcript of Advanced Regression in Excel S

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 1

    Advanced Regression

    in Excel

    The Excel Statistical Master

    By Mark Harmon

    Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission

    of the author.

    [email protected] www.ExcelMasterSeries.com

    ISBN: 978-0-9833070-6-8

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 2

    Table of Contents

    Click on Entries to Go To Each Using Dummy Variable Regression in Excel To Perform Conjoint Analysis 6

    Step-By-Step Video Showing How To Perform Conjoint Analysis Using Dummy Variable Regression in Excel In Order To Find Out Which Product Attributes Your Customers Value The Most....................................... 7

    The 6 Steps of Performing Conjoint Analysis.................................................... 8

    Step 1) List All Product Attributes For 1 Product ......................................... 8

    Step 2) Make a List of All Possible Combinations of Those Attributes .. 9

    Step 3) Have Consumer Rate Each Attribute Combination...................... 10

    Step 4) Prepare Completed Survey for Regression.................................... 11

    Dummy Variables to Be Removed From Input Data To Prevent Collinearity......................................................................................................... 11

    Step 5) Run Regression in Excel ..................................................................... 11

    Step 6) Derive Attribute Utilities From Regression Output ...................... 12

    An Example of Using a Dummy Variable........................................................... 13

    The Problem of Collinearity - and How To Solve It ......................................... 14

    The Product Utilities - The Measure of Customer Liking .............................. 14

    How To Quickly Read the Output of Regression in Excel ................................ 16

    Step-By-Step Video About How To Quickly Read and Understand the Output of Excel Regression .................................................................................. 17

    The 4 Most Important Parts of Regression Output ......................................... 17

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 3

    1) Overall Regressions Accuracy................................................................ 18

    R Square ............................................................................................................. 18

    Adjusted R Square........................................................................................... 18

    2) Probability That This Output Was Not By Chance .............................. 19

    Significance of F .............................................................................................. 19

    3) Individual Regression Coefficient Accuracy ........................................... 20

    P-value of each coefficient and the Y-intercept....................................... 20

    4) Visual Analysis of Residuals........................................................................ 21

    Charting the Residuals ................................................................................... 21

    The Residual Chart .......................................................................................... 22 Logistic Regression Analysis in Excel .................................................................. 23

    Customer Quality Scores Are Created With Logistic Regression .............. 23

    Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using Logistic Regression in Excel:............................................................................... 24

    What is Logistic Regression? .............................................................................. 24

    An Example of Logistic Regression In Action ................................................. 25

    Create the Predictive Equation ........................................................................ 26

    The Logit................................................................................................................. 26

    Calculating the Logit Variables - A, B, and Constant................................. 28

    Optimizing the Logit Variables in the Excel Solver .................................... 28

    The Final, Most Accurate Predictive Equation............................................. 30

    You'll Have To Tweek the Constraints in the Excel Solver....................... 31

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 4

    The Four Steps of Regression in Excel (Including 2 Crucial Ones Always Skipped).......................................................................................................................... 33

    Step-By-Step Video Showing How To Do All 4 Steps of Regression in Excel, Including the 2 Crucial Initial Steps That No One Does.................... 34

    Crucial Step 1) Graphing the Data....................................................................... 35

    Crucial Step 2) Running Correlation Analysis on All Variables Simultaneously ......................................................................................................... 36

    Remove Input Variables That Have Low Correlation With Output Variable ................................................................................................................... 36

    Remove Inputs Variables Highly Correlated With Other Input Variables................................................................................................................................... 37

    Adding New Input Variables To The Regression Analysis ....................... 38

    Step 3 Run the Regression in Excel ................................................................ 39

    Step 4) Analysis of Excel Output........................................................................ 40

    How To Do Nonlinear Regression Using the Excel Solver ............................... 41

    The Solver dialogue box has the following 4 parameters that need to be set: ............................................................................................................................... 45

    Objective: ............................................................................................................... 46

    Decision Variables:.............................................................................................. 46

    Constraints: ........................................................................................................... 46

    Selection of Solving Method: GRG Nonlinear.............................................. 46

    Solver Tips ................................................................................................................. 50

    Initial Solver Settings:......................................................................................... 50

    Show Iteration Results:. ................................................................................. 50

    Use Automatic Scaling:. ................................................................................. 50

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 5

    Assume Non-Negative:. .................................................................................. 50

    Bypass Solver Reports:. ................................................................................ 50

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 6

    Using Dummy Variable Regression in Excel

    To Perform Conjoint Analysis Dummy Variable Regression is a great tool for business managers. Dummy Variable Regression, for example, provides the means to perform very useful analysis such as Conjoint Analysis. Conjoint analysis quantifies how desirable each product attribute choice is relative to the other available choices for a single product. In other words, the marketer learns which product choices a consumer values most and by how much. In this article and the linked video, you will learn exactly how to perform Conjoint Analysis in Excel using Dummy Variable Regression. That may sound like advanced stuff but its really quite a bit simpler than you might imagine. The video on the next page will make the entire procedure of Dummy Variable Regression in Excel to perform Conjoint Analysis much easier to understand:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 7

    Step-By-Step Video Showing How To Perform Conjoint Analysis Using Dummy Variable Regression in Excel In Order To Find Out Which Product

    Attributes Your Customers Value The Most

    Instructional Video

    Go to

    http://www.youtube.com/watch?v=EMbiGPGlBEM to View a

    Video From Excel Master Series About How To Use

    Dummy Variable Regression in Excel To Perform Conjoint Analysis

    (Is Your Internet Connection and Sound Turned On?)

    The ultimate objective of Conjoint Analysis is quantify the consumers degree of liking for each of the choices for one product. The Utility of an attribute is the value associated with the consumers degree of liking for that choice.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 8

    The 6 Steps of Performing Conjoint Analysis A brief explanation of how Conjoint Analysis and Dummy Variable Regression are used together to arrive at the Utility for each product attribute is as follows and also in the linked video above: Step 1) List All Product Attributes For 1 Product The marketer lists all of the available choices that a consumer has for one product. The marketer starts by listing all of the overall attribute categories such as color and add-ons. The marketer then lists all of the available choices within each attribute category. For example, here the marketer would be listing all available colors and add-ons.

    List Of All Product Attributes

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 9

    Step 2) Make a List of All Possible Combinations of Those Attributes The marketer then creates a list of all possible combinations of choices available to the consumer for that one product.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 10

    Step 3) Have Consumer Rate Each Attribute Combination This list of all possible combinations is handed to the consumer. The consumer rates each combination on a scale of 1 (least desirable) to 10 (most desirable).

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 11

    Step 4) Prepare Completed Survey for Regression The survey results are arranged so that Dummy Variable Regression can be run on them. Each product choice is assigned its own Dummy Variable and one Dummy Variable from each overall attribute category is removed. This will be explained below and also in more detail in the linked video. Dummy Variables in a regression are variables that can only assume two values. One Dummy Variable must be created for each product choice. Dummy Variables to Be Removed From Input Data To Prevent Collinearity

    Step 5) Run Regression in Excel Dummy Variable Regression is then run on the survey results data.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 12

    Step 6) Derive Attribute Utilities From Regression Output The Utility for each product attribute is derived directly from the coefficients of the resulting regression equation.

    Excel Regression Output

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 13

    How To Derive The Utilities From the Output

    An Example of Using a Dummy Variable For example, if the product comes only in the colors red and white, There will be a Dummy Variable for red and one for white. The Dummy Variable for the color red can take values of only 1 or 0 because the product will either be red or not. The same applies for the white Dummy Variable, and all other dummy variables. When the survey is returned, the survey data is converted into the proper layout for the Regression function in Excel. Each Dummy Variable assigned to a specific attribute will be assigned the value of 0 or 1, depending on whether that attribute was an element of the combination that is currently being rated. Watching this done in the linked video is probably the easiest way to understand how to do it.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 14

    The Problem of Collinearity - and How To Solve It One problem can occur when Dummy Variables are inputs to a regression. The problem of Collinearity or Multicollinearity occurs when any independent variable can be used to predict the value of any other independent variable. For example, if the product comes in only red or white, you can predict whether the product is red if you know whether or not the product is white. This is Collinearity. Collinearity and Multicollinearity are corrected by removing one Dummy Variable from each choice category. For example, if color choices are red or white, the Dummy Variable for one of those colors would be removed. Collinearity is then solved. You cannot predict whether of not the product is red if you do not know whether the product is white (because the Dummy Variable for white has been removed). The data can now be run as a regular regression using Excels regression tool. The linked video shows how to do this in detail. The regression is run and a regression equation is obtained. The Product Utilities - The Measure of Customer Liking The Utilities of each of the product choices are set to equal the value of the coefficients of the regression equation. The Utility is the degree of liking that the consumer attached to that product choice. For example, the marketer will find out how important the color red was compared to each of the other product choices during the purchase decision. Utilities of product choices that were associated with the Dummy Variables that were removed to prevent collinearity will be assigned the value of 0. We now have Utilities for each attribute. Now, the overall attractiveness of a particular combination of choices can be calculated by adding up the individual Utilities associated with the each of the choices. The sum of the Utilities for each combination is the regressions prediction of consumers degree of liking for that combination of product choices. The removal of the individual Dummy Variables does not affect the accuracy or completeness of the answer. Adding up the Utilities for each combination will produce a figure that will be very close to the consumers actual rating for that combination. An example of this is shown in the video.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 15

    Showing the Regression Equation Predicts Nearly the Same Score as the Customer's Ranking of Card 13, Even Though Dummy Variables Were

    Removed

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 16

    How To Quickly Read the Output of Regression Analysis Done in Excel

    There is a lot more to the Excel Regression output than just the regression equation. If you know how to quickly read the output of a Regression done in, youll know right away the most important points of a regression: if the overall regression was a good, whether this output could have occurred by chance, whether or not all of the independent input variables were good predictors, and whether residuals show a pattern (which means theres a problem).

    Excel Regression Output With Color-Coding Added

    This video will illustrate exactly how to quickly and easily understand the output of Regression performed in Excel:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 17

    Step-By-Step Video About How To Quickly Read and Understand the Output of Excel Regression

    (Is Your Sound and Internet Connection Turned On?)

    The 4 Most Important Parts of Regression Output 1) Overall Regression Equations Accuracy (R Square and Adjusted R Square) 2) Probability That This Output Was Not By Chance (ANOVA Significance of F) 3) Individual Regression Coefficient and Y-Intercept Accuracy

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 18

    4) Visual Analysis of Residuals Some parts of the Excel Regression output are much more important than others. The goal here is for you to be able to glance at the Excel Regression output and immediately understand it, so we will focus our attention only on the four most important parts of the Excel regression output. 1) Overall Regressions Accuracy

    R Square This is the most important number of the output. R Square tells how well the regression line approximates the real data. This number tells you how much of the output variables variance is explained by the input variables variance. Ideally we would like to see this at least 0.6 (60%) or 0.7 (70%). Adjusted R Square This is quoted most often when explaining the accuracy of the regression equation. Adjusted R Square is more conservative the R Square because it is

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 19

    always less than R Square. Another reason that Adjusted R Square is quoted more often is that when new input variables are added to the Regression analysis, Adjusted R Square increases only when the new input variable makes the Regression equation more accurate (improves the Regression equationss ability to predict the output). R Square always goes up when a new variable is added, whether or not the new input variable improves the Regression equations accuracy. 2) Probability That This Output Was Not By

    Chance

    Significance of F This indicates the probability that the Regression output could have been obtained by chance. A small Significance of F confirms the validity of the Regression output. For example, if Significance of F = 0.030, there is only a 3% chance that the Regression output was merely a chance occurrence.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 20

    3) Individual Regression Coefficient Accuracy

    P-value of each coefficient and the Y-intercept The P-Values of each of these provide the likelihood that they are real results and did not occur by chance. The lower the P-Value, the higher the likelihood that that coefficient or Y-Intercept is valid. For example, a P-Value of 0.016 for a regression coefficient indicates that there is only a 1.6% chance that the result occurred only as a result of chance.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 21

    4) Visual Analysis of Residuals

    Charting the Residuals

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 22

    The Residual Chart

    The residuals are the difference between the Regressions predicted value and the actual value of the output variable. You can quickly plot the Residuals on a scatterplot chart. Look for patterns in the scatterplot. The more random (without patterns) and centered around zero the residuals appear to be, the more likely it is that the Regression equation is valid. There are many other pieces of information in the Excel regression output but the above four items will give a quick read on the validity of your Regression.

    Hand Calculation of Regression Problems

    Go To http://excelmasterseries.com/Excel_Statistical_Master/Regression.php

    To View How To Solve Regression Problems By Hand (No Excel)

    (Is Your Internet Connection Turned On ?)

    You'll Quickly See Why You Always Want To Use Excel To Solve Statistical

    Problems !

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 23

    Logistic Regression Analysis in Excel

    Wouldnt it be great if there was a more accurate way to predict whether your prospect will buy rather than just taking an educated guess? Well, there isif you have enough data on your previous prospects. The tool that makes this possible is called Logistic Regression and can be easily implemented in Excel.

    Customer Quality Scores Are Created With Logistic Regression

    Marketers use Logistic Regression to rank their prospects with a quality score which indicates that prospects likelihood to buy. The more data youve collected from previous prospects, the more accurately youll be able to use Logistic Regression in Excel to calculate your new prospects probability of purchasing.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 24

    Step-By-Step Video Showing How To Predict if a Prospect Will Buy Using Logistic Regression in Excel:

    Instructional Video

    Go to http://www.youtube.com/watch?v=NHOO7iceJrw

    to View a Video From Excel Master Series

    About How To Use Logistic Regression

    in Excel To Predict of Your Next Prospect

    WILL BUY! (or not !#!$%!)

    (Is Your Internet Connection and Sound Turned On?)

    What is Logistic Regression? Logistic Regression calculates the probability of the event occurring, such as the purchase of a product. In general, the thing being predicted in a Regression equation is represented by the dependent variable or output variable and is

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 25

    usually labeled as the Y variable in the Regression equation. In the case of Logistic Regression, this Y is binary. In other words, the output or dependent variable can only take the values of 1 or 0. The predicted event either occurs or it doesnt occur your prospect either will buy or wont buy. Occasionally this type of output variable also referred to as a Dummy Dependent Variable.

    An Example of Logistic Regression In Action Here is a marketing example showing how Logistic Regression works. The embedded video walks through this example in Excel as well: Suppose that you have collected three pieces of data on each of your previous prospects. The data you have collected on each prospect was: 1) The prospects age 2) The prospects gender (1 = Male and 0 = Female) 3) Whether the prospect purchased or not (Did purchase Y = 1, Did not purchase, Y = 0).

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 26

    Create the Predictive Equation

    With the above data, you could create a predictive equation that would calculate a new prospects probability of purchasing by inputting this new prospects age and gender. This predictive equation will be in the form of: P(X) = eL/ (1+eL) P(X) represents the possibility of event X occurring.

    The Logit Event X is a purchase. In other words, P(X) is the probability that Y = 1. P(X) has only one variable. That is L, which is called the Logit. The Logit, L = Constant + A * Age + B * Gender L, the Logit, has 3 variables: Constant, A, and B. They must be known before P(X) can be calculated. Those 3 variables can be found in Excel by using the Excel Solver. The Excel Solver will find the optimal combination of those 3 variables that causes the resulting P(X) to most accurately predict whether Y = 1 or 0 for all previous prospects.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 27

    Everything To the Right of the Above Is Continued As Follows:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 28

    Calculating the Logit Variables - A, B, and Constant

    Heres how the most optimal set of Logit variables (Constant, A, and B) are found in Excel: Using Excel, each recorded prospect has the following calculation performed: P(X)Y * [ 1 - P(X) ] (1-Y) The Y refers to Y = 1 if the prospect bought and Y = 0 if the prospect didnt buy. The P(X) is the probability of purchase that will be calculated using the equation listed above. In Excel, the P(X) calculation is initially performed by the Excel Solver using Logit variables (Constant, A, and B) which are not optimal. The Excel Solver will then continuously try new combinations of these variables until the optimal P(X) is found. Optimizing the Logit Variables in the Excel Solver

    Heres how the Excel Solver knows when it has found the correct combinations of these 3 variables so that the resulting P(X) equation most accurately predicts whether Y = 1 or 0: The equation P(X)Y * [ 1 - P(X) ] (1-Y) is maximized when P(X) is most accurate. It approaches it highest value (1) when Y = 1 and P(X) approaches 1. It also approaches its highest value (1) when Y = 0 and P(X) approaches 0. When Y = 1 and P(X) = 1, that is a 100% correct prediction by P(X) that Y = 1. When Y = 0 and P(X) = 0, that is a 100% correct prediction by P(X) that Y = 0. Each prospect has a separate P(X)Y * [ 1 - P(X) ] (1-Y) value calculated for him or her.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 29

    The sum of each P(X)Y * [ 1 - P(X) ] (1-Y)calculation for all prospects is taken. The only variables that exist when calculating P(X)Y * [ 1 - P(X) ] (1-Y)are Y and the variables of P(X), which are Constant, A, and B. Use the Excel Solver, these variable are adjusted until their values maximize the sum of all P(X)Y * [ 1 - P(X) ] (1-Y)

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 30

    The Final, Most Accurate Predictive Equation

    When the sum of P(X)Y * [ 1 - P(X) ] (1-Y) is maximized, then the final resulting P(X) equation is as accurate as possible at predicting whether Y will be 1 or 0.

    The Excel Solver Dialogue Box

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 31

    Stated another way, we now have a predictive equation P(X ) which uses the optimal combination of Constant, A, and B which most accurately calculates the probability that Y = 1 given a prospects age and gender. The embedded video provides a clear picture of all of this in action in Excel. The use of the Excel Solver does require some hand-tweeking to ensure that the most accurate answer is obtained. The video shows an example of this. Ultimately what the Solver is doing is adjusting variables Constant, A, and B to maximize the sum of the column of P(X)Y * [ 1 - P(X) ] (1-Y) equations. The answer obtained by the Solver should maximize that sum and provide realistic answers for the probabilities of each prospect, including the new one. You'll Have To Tweek the Constraints in the Excel

    Solver Youll probably find that you have to experiment by applying constraints to the variables that Solver is adjusting in order to maximize the target sum. The variables that Solver adjusts are called Decision Variables. Solver allows you to create constraints on the value of any Decision Variable.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 32

    Adding a Constraint to the Solver

    In the video, you will be able to watch how a Decision Variable is constrained to make the final answer more accurate. The Decision Variable called Constant was constrained to always remain above -25 during the Solver analysis. This resulted in the most accurate and realistic maximization of the sum of the P(X)Y * [ 1 - P(X) ] (1-Y) equations.

    Conclusion Logistic Regression in Excel Is an Incredible Predictor but Not the Simplest Analysis Logistic Regression is not the simplest type of analysis to understand or perform. Hopefully this article and video have provided a much clearer picture for you.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 33

    The Four Steps of Regression in Excel

    (Including Two Crucial Steps That Most People

    Skip) Running a Regression in Excel is fairly easy. So is running one incorrectly. There are two crucial steps that should always be performed on the data before any Regression should be run. Fortunately these two steps are very quick and easy to do in Excel. They are: 1) Graph the Data 2) Run Correlation Analysis On All Variables Following is a video of this article showing how to perform all four steps to Regression in Excel, including the above two crucial steps at the beginning:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 34

    Step-By-Step Video Showing How To Do All 4 Steps of Regression in Excel, Including the 2 Crucial Initial Steps That No One Does, But Should

    (Is Your Sound and Internet Connection Turned On?)

    Why You Need To Run The 2 Crucial Steps Before

    Doing Regression Heres why you need to run the two crucial steps prior to regressing any data in Excel:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 35

    Crucial Step 1) Graphing the Data Whether or not you are using Excel to run a Regression, you should always graph the data before doing anything else. Eyeballing the data will allow you to quickly determine whether there is any relationship between the independent (input) variables and the dependent (output) variable. You also want to evaluate whether the graph generally appears to be linear or possibly quadratic. Excels Regression Tool works well only for reasonably linear data. Eyeballing the data upfront will tell you very quickly whether Excels Linear Regression is the right tool for the job.

    Graphing The Data To Check If It Is Linear

    The input and output variables will be graphed together. The y-axis of the chart will provide the scale for plotting of those values. The x-axis will provide a measure of whatever continuum was used, e.g. time, to collect the values of all of the variables. Excels charting function is the way to go here. The above linked video shows exactly how to chart all the data in Excel.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 36

    Crucial Step 2) Running Correlation Analysis on All Variables Simultaneously There are two good reasons for doing this. First, we want to remove any input variables which are clearly not good predictors of the output variable. Second, we want to make sure that none of the input variables have a high correlation with (are good predictors of) other input variables. Running Correlation Analysis on the Data To Prevent Collinearity and also To Remove Input Variables That Have Low Correlation With the Output Variable

    Correlation of multiple variables is easily done in Excel using the Correlation Data Analysis tool. The linked video shows exactly how to do that. Remove Input Variables That Have Low Correlation With Output Variable After you have run Correlation Analysis on the data, you will want to remove any input variables that have a low correlation with the output variable. A Correlation Coefficient of with an absolute value of less than 0.4 (between -0.4 and +0.4)

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 37

    between the output variable and an input variable indicates that the input variable is not a good predictor of the output. That input variable should be removed from the Regression Analysis. The attached video provides an example of this.

    Data Columns Before Removing Input Variable With Low Correlation To Output

    Data Columns After Removing Input Variables With Low Correlation To Output

    Remove Inputs Variables Highly Correlated With Other Input Variables After looking at the Correlation Coefficients between the input and output variables, look at the Correlation Coefficients between the input variables themselves. You do not want to use pairs of input variables that are good predictors of each other in a Regression. This will cause a Regression error known as Collinearity or Multicollinearity. One variable from any pair of highly-correlated input variables should be removed prior to running the Regression Analysis. Variables can be considered highly-Correlated if the absolute value of their Correlation Coefficient is greater the 0.7 (greater than +0.7 or less than -0.7).

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 38

    Adding New Input Variables To The Regression Analysis Here are a few hints about adding new input variables to a Regression Analysis. First, build up a Regression by starting with a small number of input variables and add any new ones one at a time. Second, good new input variables noticeably increase Adjusted R Square and also lower Standard Error without significantly changing the existing Regression Coefficients.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 39

    Step 3 Run the Regression in Excel When you are satisfied with the output of the data graph and the Correlation Analysis, go ahead and run the Regression with Excel. An example of how to do this is shown in the above video.

    The Excel Regression Dialogue Box

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 40

    Final Step 4) Analysis of Excel Output The final step of Excel Regression is Analysis of the Excel output. Please refer to the chapter of this manual that goes into detail about how to quickly read and understand the output of regression done in Excel.

    Excel Regression Output With Color Coding Added

    Conclusion - Plotting the Data and Running Correlation Can Be BIG Time Savers

    Plotting the data and running Correlation Analysis prior to running a Regression can save you lots of time that you might otherwise have to spend making adjustments to your Regression after running it.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 41

    Using How To Do Nonlinear Regression Using the Excel Solver

    Excel Solver is one of the best and easiest curve-fitting devices in the world, if you know how to use it. Its curve-fitting capabilities make it an excellent tool to perform nonlinear regression. The Excel Solver will find the equation of the linear or nonlinear curve which most closely fits a set of data points. One very important caveat must be added: the user must first determine the general type of the curve and input that information into Solver at the start. This information is in the form of the general equation that defines the curve, such as a0 + a1*x + a2*x2 = c or a*ln(xb) = c. Solver then calculates all needed variables which produce the equation which most closely fits the data points. We will run through an example here. In this problem we are going to show how to use the Excel Solver to calculate an equation which most closely describes the relationship between sales and number of ads being run. The purpose of this equation is to be able to predict the number of sales based upon the number of ads that will be run. A marketing manager has collected this following data on the companys sales vs. the number of ads that were running at different times. Sales Number of Ads Running 50 6700 55 7500 59 8700 62 8900 75 8800 95 10900 110 11200 125 11400 140 11500 180 12300

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 42

    Here is an Excel scatter plot of that data:

    We would like to create an equation from this data that allows us to predict the sales based upon the number of ads currently running. The first step is to eyeball the data and estimate what general type of curve this graph probably is. In this case it appears to a graph the has a diminishing y value for an increasing x value. A formula for such a curve would have the general form: Y = A1 + A2 * XB1 Sales = A1 + A2 * (Number of Ads Running)B1 We can use the Excel Solver to solve for A1, A2, and B1. We need to arrange the data in a form that can be input into the Excel Solver as follows:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 43

    This table shows the arrangement of data and the calculations. Here we have created an Excel model based upon our model of: Sales = A1 + A2 * (Number of Ads Running)B1 One example of this formula in action is explained for Cell E16. We are listing the variable that we are solving for (A1, A2, and B1) in cells B3 to B5. In Solver language, these solves that we are changing are called Decision Variables. We have arbitrarily set our Decision Variables for: A1 = 100 A2 = 100 B1 = 0.05 We now take the difference between the actual number of sales and the number of sales predicted by our model with our arbitrary settings for the Decision

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 44

    Variables. The square of each difference is taken and then all squares are summed up. We are trying to find the settings for the Decision Variables that will minimize the sum of the squares of the differences. In other words, we are trying to find A1, A2, and B1 that will minimize the number in cell G13. Once the Solver has been installed as an add-in (To add-in Solver: File / Options / Add-Ins / Manage / Excel Add-Ins / Go / Solver Add-In), you can access the Solver in Excel 2010 by: Data / Solver. The following blank Solver dialogue box comes up:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 45

    The Solver dialogue box has the following 4 parameters that need to be set: 1) The Objective Cell This is the target cell that we are either trying to maximize, minimize, or achieve a certain value. 2) Minimize or Maximize the Target, or attempt to achieve a certain value in the Objective cell. 3) Decision Variables A set of variables that will be changed by the Excel Solver in order to optimize the target cell. 4) Constraints These are the limitations that the problem subjects the Solver to during its calculations Once again, here is the data table for Solver inputs:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 46

    Objective: We are trying to minimize Cell G13, the sum of the square of differences between the actual and predicted sales. Decision Variables: We are changing A1, A2, and B1 (cells B3 to B5) to minimize our Objective, Cell G13. The Decision Variables are therefore Cells B3 to B5. Constraints: There are none for this curve-fitting operation. Selection of Solving Method: GRG Nonlinear The GRG Nonlinear method is used when the equation producing the objective is not linear but is smooth (continuous). Examples of smooth nonlinear functions in Excel are: =1/C1, =Log(C1), and =C1^2 These functions have graphs that are curved (nonlinear), but have no breaks (smooth) Our sales equation appears to be smooth and non-linear:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 47

    Sales = A1 + A2 * (Number of Ads Running)B1 Here is the completed Solver dialogue box:

    Here is a close-up of the Solver Objective, Decision Variables, and Constraints:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 48

    If we now hit the Solve button, we get the following result:

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 49

    Solver has optimized the Decision Variables to minimize the objective function as follows: A1 = -445,616 A2 = 437,247 B1 = 0.00911 The Objective is minimized to: 2,556,343 We can now create an Excel graph of the Actual Sales vs. the Predicted Sales as follows:

    Solver calculates that Sales can be predicted from Number of Ads Running by the following equation: Sales = -445616 + 437247 * (Number of Ads Running)0.00911 The trickiest part of this problem is the first step; eyeballing the data to determine what kind of graph the data is arranged in. You should take time to evaluate whether you are pursuing calculation of the correct curve type.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 50

    Solver Tips You may notice that if you run this problem through the Solver multiple time, you will get slightly different answers. Each time that you run Solvers GRG algorithm, it will calculate different values for the Decision Variables. You are trying to find the values for the Decision Variables that minimize the objective function (cell G13) the most. When the Solver runs the GRG algorithm, it picks a starting point for its calculations. Each time you run the Solver GRG method a slightly different starting point will be picked. That is why different answers will appear during each run. Choose the Decision Variable value that occur during the run which produces the lowest value of the Objective. Keep running the Solver until the objective is not minimized anymore. That should give you the optimal values of the Decision Variables. That was done in the example above. Initial Solver Settings: Here are some Solver settings that you want to configure prior to running the Solver for most problems. These settings are found when you click the Options button: Show Iteration Results: Leave this unchecked. This stops the GRG Solver after each iteration, displaying the result for that iteration. Very rarely is there a reason for doing that. Use Automatic Scaling: Leave this box unchecked. You would only use this option if you had reason to believe that inputs of the Solver were measured using different scales. Assume Non-Negative: Only check this if you are sure that none of the variables can ever be negative. In this case, that is clearly not the case. Bypass Solver Reports: Leave this box unchecked. There is no advantage to not having Solver reports for each Solver run.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 51

    Summary Excel Solver is an easy-to-use and powerful nonlinear regression tool as a result of its curve-fitting capacity. One use of this is to calculate predictive sales equations for your company. It will work as long as you have properly determined the correct general curve type in the beginning.

  • Advanced Regression in Excel The Excel Statistical Master

    Copyright 2011 http://ExcelMasterSeries.com/New_Manuals.php Page 52

    Meet the Author Mark Harmon is a master number cruncher. Creating overloaded Excel spreadsheets loaded with complicated statistical analysis is his idea of a good time. His profession as an Internet marketing manager provides him with the opportunity and the need to perform plenty of meaningful statistical analysis at his job. Mark Harmon is also a natural teacher. As an adjunct professor, he spent five years teaching more than thirty semester-long courses in marketing and finance at the Anglo-American College in Prague and the International University in Vienna, Austria. During that five-year time period, he also worked as an independent marketing consultant in the Czech Republic and performed long-term assignments for more than one hundred clients. His years of teaching and consulting have honed his ability to present difficult subject matter in an easy-to-understand way. Harmon received a degree in electrical engineering from Villanova University and MBA in marketing from the Wharton School.

    Front CoverTitle and CopyrightTable of ContentsUsing Dummy Variable Regression in Excel To Perform Conjoint AnalysisStep-By-Step Video Showing How To Perform Conjoint Analysis Using Dummy Variable Regression in Excel In Order To Find Out Which Product Attributes Your Customers Value The MostThe 6 Steps of Performing Conjoint AnalysisStep 1) List All Product Attributes For 1 ProductStep 2) Make a List of All Possible Combinations of Those AttributesStep 3) Have Consumer Rate Each Attribute CombinationStep 4) Prepare Completed Survey for RegressionDummy Variables to Be Removed From Input Data To Prevent Collinearity

    Step 5) Run Regression in ExcelStep 6) Derive Attribute Utilities From Regression Output

    An Example of Using a Dummy VariableThe Problem of Collinearity - and How To Solve ItThe Product Utilities - The Measure of Customer Liking

    How To Quickly Read the Output of Regression in ExcelStep-By-Step Video About How To Quickly Read and Understand the Output of Excel RegressionThe 4 Most Important Parts of Regression Output1) Overall Regressions AccuracyR SquareAdjusted R Square

    2) Probability That This Output Was Not By ChanceSignificance of F

    3) Individual Regression Coefficient AccuracyP-value of each coefficient and the Y-intercept

    4) Visual Analysis of ResidualsCharting the ResidualsThe Residual Chart

    Logistic Regression Analysis in ExcelCustomer Quality Scores Are Created With Logistic RegressionStep-By-Step Video Showing How To Predict if a Prospect Will Buy Using Logistic Regression in ExcelWhat is Logistic Regression?An Example of Logistic Regression In ActionCreate the Predictive EquationThe LogitCalculating the Logit Variables - A, B, and ConstantOptimizing the Logit Variables in the Excel SolverThe Final, Most Accurate Predictive EquationYou'll Have To Tweek the Constraints in the Excel Solver

    The Four Steps of Regression in Excel (Including 2 Crucial Ones Always Skipped)Step-By-Step Video Showing How To Do All 4 Steps of Regression in Excel, Including the 2 Crucial Initial Steps That No One DoesCrucial Step 1) Graphing the DataCrucial Step 2) Running Correlation Analysis on All Variables SimultaneouslyRemove Input Variables That Have Low Correlation With Output VariableRemove Inputs Variables Highly Correlated With Other Input VariablesAdding New Input Variables To The Regression Analysis

    Step 3 Run the Regression in ExcelStep 4) Analysis of Excel Output

    How To Do Nonlinear Regression Using the Excel SolverThe Solver dialogue box has the following 4 parameters that need to be setObjectiveDecision VariablesConstraintsSelection of Solving Method: GRG Nonlinear

    Solver TipsInitial Solver SettingsShow Iteration ResultsShow Iteration ResultsAssume Non-NegativeBypass Solver Reports