Session 4 LJ(1)
-
Upload
angel-sharma -
Category
Documents
-
view
7 -
download
0
description
Transcript of Session 4 LJ(1)
Session 4 Learning Journal question
Session 4 Learning Journal question
The data in the Excel file AdvertSales.xlsx were collected by a small company and concern the company’s weekly sales (£) following various weekly advertising expenditures (£).
Perform the correlation and simple linear regression analysis.
INSTRUCTIONS
Create the required SPSS output and paste where prompted.
In the text below *** prompts where you have to type a numerical value read from SPSS output.
Text in bold indicates where you have to choose the appropriate comment based upon
supporting output e.g. we do/do not rejectH0 .
DELETE THE ABOVE INSTRUCTIONS WHEN YOU HAVE
COMPLETED THE LEARNING JOURNAL QUESTION
DELETE SENTENCES IN CAPITAL LETTERS WHEN YOU HAVE
COMPLETED THE LEARNING JOURNAL QUESTION
BEFORE PERFORMING THE REGRESSION WE NEED TO PERFORM SOME EDA
THIS IS FOR DATA SCREENING PURPOSES AND TO FAMILIARISE
OURSELVES WITH THE DATA, AT THE MOMENT DO YOU EVEN KNOW:
HOW MANY DATA POINTS YOU HAVE?
WHAT IS THE MEAN ADVERTISING EXPENDITURE?
HOW VARIABLE ARE THE WEEKLY SALES?
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 1
Session 4 Learning Journal question
SALES IS THE VARIABLE WE ARE ULTIMATELY TRYING PREDICT
IT IS THE VARIABLE THAT IS RANDOM: ADVERTISING IS CONTROLLED BY THE COMPANY AND THUS PROBABLY NOT AS INTERESTING!
BEGIN BY SOME EDA OF JUST SALES VARIABLE
Sales Exploratory Data Analysis
PASTE CASE PROCESSING SUMMARY TABLE HERE
From the above we see that we have *** observations of sales.
PASTE BOXPLOTS HERE
The boxplot of sales reveals there are *** outliers.
Visually the boxplot of sales is/is not consistent with normal data.
PASTE DESCRIPTIVES TABLE HERE
The mean weekly sales figure is £ ***.
The 95% confidence interval of this mean is from £*** to £***.
The median weekly sales figure is £ ***.
The best weekly sales figure the company had is £***.
The worst weekly sales figure was £***.
The standard deviation of sales is ***; this tells us that approximately 95% of the time the company’s weekly sales will be approximately between ¿∗¿±1.96 ×∗¿∗¿, i.e. £*** to £***.
The sales skewness is/is not consistent with normality.
The sales kurtosis is/is not consistent with normality.
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 2
Session 4 Learning Journal question
PASTE PERCENTILES TABLE HERE: IN A PRESENTATION THAT INCLUDES THE BOXPLOT YOU MIGHT WANT TO QUOTE SOME OF THE |FOLLOWING
The lower quartile is £***.
The upper quartile is £***.
OR IT MAYBE OF INTEREST TO QUOTE
95% of the time sales are over £***.
PASTE HISTOGRAM HERE
IN A PRESENTATION INCLUDING THIS YOU COULD DISPLAY THIS WHILST
DISCUSSING LOCATION/DISPERSION AND SHAPE
AND/OR QUOTING STATISTICS FROM TABLES ABOVE
“Here we see a histogram of our weekly sales data. We can see that sales fall between roughly from as low as £*** to as high as £***. The average sale is approximately £***. The sales vary over a range of £***. The data is fairly symmetrical/slightly negatively skewed/slightly positively skewed.’’
PASTE NORMAL Q-Q PLOT HERE
The above plot does/does not give us faith that the data is normal as the points are/are not nicely entwined around the straight line.
EDIT IN SPSS THE TEST OF NORMAILTY TABLE
TO REMOVE KOLMOGOROV-SMIRNOV TESTS
PASTE SHAPIRO WILK TESTS OF NORMALITY TABLE HERE
The Shapiro-Wilk (S-W) statistic does/does not give evidence of departure from normality (S-W(***) =***, p = ***).
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 3
Session 4 Learning Journal question
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 4
Session 4 Learning Journal question
WHILST SALES IS THE RANDOM VARIABLE OF MAIN INTEREST
WE STILL SHOULD SCREEN ADVERTISING AND KNOW
SOMETHING ABOUT ITS DISTRIBUTION
Advertising Exploratory Data Analysis
PASTE CASE PROCESSING SUMMARY TABLE HERE
From the above we see that we have *** observations of advertising.
PASTE BOXPLOTS HERE
The boxplot of sales reveals there are *** outliers.
PASTE DESCRIPTIVES TABLE HERE
The mean weekly advertising figure is £ ***.
The median weekly advertising figure is £ ***.
The most spent on advertising in a week was £***.
The least spent on advertising in a week was £***.
PASTE HISTOGRAM HERE
“Here we see a histogram of our weekly advertising expenditure. We can see that roughly from £*** to £*** is spent on advertising each week. The average amount of weekly advertising is approximately £***. The amount spent in advertising varies over a range of £***.’’
WE SHALL NOW CONSIDER THE RELATIONSHIP
BETWEEN SALES AND ADVERTISING
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 5
Session 4 Learning Journal question
PASTE SALES V ADVERTISING SCATTERPLOT HERE
THINK ABOUT WHICH OF THE TWO VARIABLES IS THE DEPENDENT VARIABLE AND THUS SHOULD BE PLOTTED ON THE Y AXIS!!!
From the above we can see the following.
There appears to be a negative/no/a positive correlation between sales and advertising.
The relationship between sales and advertising is linear/curved.
The variability of sales is/is not constant over advertising.
Thus it appears that simple linear regression is/is not appropriate for this data.
PASTE CORRELATION TABLE HERE
The Pearson correlation coefficient value of *** confirms what was apparent from the graph; there appears to be a very weak/weak/moderate/strong/very strong positive/negative correlation between the two variables.
There is/is not a significant correlation between sales and advertising (r=***, N=***, p=***).
PASTE MODEL SUMMARY OUTPUT HERE
From the above we can see that the model fits the data reasonably well; ***% of the variation in the sales values can be explained by the fitted line together with the advertising values. Conversely we have roughly a ***% of the variation not explained by the linear regression.
The standard deviation of sales around their expected values is ***.
PASTE MODEL COEFFICIENTS OUTPUT HERE
From the above we can see the following.
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 6
Session 4 Learning Journal question
The intercept is ***. This is/is not significantly different to zero (p=***).
The gradient is ***. This is/is not significantly different to zero (p=***).
The expected sales value is given by:
Sales = *** + *** × advertising
Thus we can see that for each £1 increase in advertising, the sales value is expected to increase by £***. The 95% confidence interval for this expected increase is £*** to £***.
The intercept for this example could be interpreted as the sales value (***) when there is no advertising. However this is extrapolation and thus cannot be relied upon!
PASTE MODEL CASE DIAGNOSTICS OUTPUT HERE
From the above we can see that the ***th weekly observation does not fit the model too well. This week the model predicted sales of £*** but we experienced £*** less/more.
PASTE HISTOGRAM OF STD REDISUALS HERE
The fitted normal curve does/does not match the observed residuals well. Thus the normality assumption does/does not seem reasonable.
PASTE Q-Q PLOT HERE
The plotted points do/do not follow the straight line fairly well. Thus the normality assumption is/is not met
PASTE STD RESIDUAL V STD PREDICTED VALUE PLOT HERE
From the above we can/cannot see a relationship between the residuals and the predicted values. Thus the fitted model is/is not consistent with the assumption of linearity.
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 7
Session 4 Learning Journal question
PASTE FITTED LINE WITH 95% CI PLOT HERE
The above plot gives us a visual idea of the predicted sales for various advertising expenditure. We can see that as you approach the extreme advertising values, the 95% confidence interval gets narrower/wider, indicating that the accuracy of our expected prediction is less/more.
Suppose you have been asked to predict the weekly sales for advertising expenditures of £1500 and £1800. Add these values at the bottom of the data set. Rerun the regression saving predicted values and 95% confidence interval.
PASTE SCREENSHOT OF DATA VIEW WITH PREDICTION + CI HERE
State the predictions:
For an advertising expenditure of £1500, the predicted sales is £*** with a 95% confidence interval from £*** to £***.
For an advertising expenditure of £1800, the predicted sales is £*** with a 95% confidence interval from £*** to £***.
Predictions from this model should be good as the R2 value is high (***%). However, predictions from extrapolation outside of the observed advertising expenditure data range (minimum *** to maximum=***) cannot be trusted. Thus we have reservations about the prediction for an advertising expenditure of £1500/£1800.
SAVE THE COMPLETED LEARNING JOURNAL QUESTION
AS THIS WILL FORM PART OF YOUR LEARNING JOURNAL SUBMISSION
Methods of Enquiry
Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 8