Session 4 LJ(1)

9
Session 4 Learning Journal question Session 4 Learning Journal question The data in the Excel file AdvertSales.xlsx were collected by a small company and concern the company’s weekly sales (£) following various weekly advertising expenditures (£). Perform the correlation and simple linear regression analysis. INSTRUCTIONS Create the required SPSS output and paste where prompted. In the text below *** prompts where you have to type a numerical value read from SPSS output. Text in bold indicates where you have to choose the appropriate comment based upon supporting output e.g. we do/do not reject H 0 . DELETE THE ABOVE INSTRUCTIONS WHEN YOU HAVE COMPLETED THE LEARNING JOURNAL QUESTION DELETE SENTENCES IN CAPITAL LETTERS WHEN YOU HAVE COMPLETED THE LEARNING JOURNAL QUESTION BEFORE PERFORMING THE REGRESSION WE NEED TO PERFORM SOME EDA THIS IS FOR DATA SCREENING PURPOSES AND TO FAMILIARISE OURSELVES WITH THE DATA, AT THE MOMENT DO YOU EVEN KNOW: Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir ([email protected] ) 1

description

SPSS

Transcript of Session 4 LJ(1)

Page 1: Session 4 LJ(1)

Session 4 Learning Journal question

Session 4 Learning Journal question

The data in the Excel file AdvertSales.xlsx were collected by a small company and concern the company’s weekly sales (£) following various weekly advertising expenditures (£).

Perform the correlation and simple linear regression analysis.

INSTRUCTIONS

Create the required SPSS output and paste where prompted.

In the text below *** prompts where you have to type a numerical value read from SPSS output.

Text in bold indicates where you have to choose the appropriate comment based upon

supporting output e.g. we do/do not rejectH0 .

DELETE THE ABOVE INSTRUCTIONS WHEN YOU HAVE

COMPLETED THE LEARNING JOURNAL QUESTION

DELETE SENTENCES IN CAPITAL LETTERS WHEN YOU HAVE

COMPLETED THE LEARNING JOURNAL QUESTION

BEFORE PERFORMING THE REGRESSION WE NEED TO PERFORM SOME EDA

THIS IS FOR DATA SCREENING PURPOSES AND TO FAMILIARISE

OURSELVES WITH THE DATA, AT THE MOMENT DO YOU EVEN KNOW:

HOW MANY DATA POINTS YOU HAVE?

WHAT IS THE MEAN ADVERTISING EXPENDITURE?

HOW VARIABLE ARE THE WEEKLY SALES?

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 1

Page 2: Session 4 LJ(1)

Session 4 Learning Journal question

SALES IS THE VARIABLE WE ARE ULTIMATELY TRYING PREDICT

IT IS THE VARIABLE THAT IS RANDOM: ADVERTISING IS CONTROLLED BY THE COMPANY AND THUS PROBABLY NOT AS INTERESTING!

BEGIN BY SOME EDA OF JUST SALES VARIABLE

Sales Exploratory Data Analysis

PASTE CASE PROCESSING SUMMARY TABLE HERE

From the above we see that we have *** observations of sales.

PASTE BOXPLOTS HERE

The boxplot of sales reveals there are *** outliers.

Visually the boxplot of sales is/is not consistent with normal data.

PASTE DESCRIPTIVES TABLE HERE

The mean weekly sales figure is £ ***.

The 95% confidence interval of this mean is from £*** to £***.

The median weekly sales figure is £ ***.

The best weekly sales figure the company had is £***.

The worst weekly sales figure was £***.

The standard deviation of sales is ***; this tells us that approximately 95% of the time the company’s weekly sales will be approximately between ¿∗¿±1.96 ×∗¿∗¿, i.e. £*** to £***.

The sales skewness is/is not consistent with normality.

The sales kurtosis is/is not consistent with normality.

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 2

Page 3: Session 4 LJ(1)

Session 4 Learning Journal question

PASTE PERCENTILES TABLE HERE: IN A PRESENTATION THAT INCLUDES THE BOXPLOT YOU MIGHT WANT TO QUOTE SOME OF THE |FOLLOWING

The lower quartile is £***.

The upper quartile is £***.

OR IT MAYBE OF INTEREST TO QUOTE

95% of the time sales are over £***.

PASTE HISTOGRAM HERE

IN A PRESENTATION INCLUDING THIS YOU COULD DISPLAY THIS WHILST

DISCUSSING LOCATION/DISPERSION AND SHAPE

AND/OR QUOTING STATISTICS FROM TABLES ABOVE

“Here we see a histogram of our weekly sales data. We can see that sales fall between roughly from as low as £*** to as high as £***. The average sale is approximately £***. The sales vary over a range of £***. The data is fairly symmetrical/slightly negatively skewed/slightly positively skewed.’’

PASTE NORMAL Q-Q PLOT HERE

The above plot does/does not give us faith that the data is normal as the points are/are not nicely entwined around the straight line.

EDIT IN SPSS THE TEST OF NORMAILTY TABLE

TO REMOVE KOLMOGOROV-SMIRNOV TESTS

PASTE SHAPIRO WILK TESTS OF NORMALITY TABLE HERE

The Shapiro-Wilk (S-W) statistic does/does not give evidence of departure from normality (S-W(***) =***, p = ***).

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 3

Page 4: Session 4 LJ(1)

Session 4 Learning Journal question

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 4

Page 5: Session 4 LJ(1)

Session 4 Learning Journal question

WHILST SALES IS THE RANDOM VARIABLE OF MAIN INTEREST

WE STILL SHOULD SCREEN ADVERTISING AND KNOW

SOMETHING ABOUT ITS DISTRIBUTION

Advertising Exploratory Data Analysis

PASTE CASE PROCESSING SUMMARY TABLE HERE

From the above we see that we have *** observations of advertising.

PASTE BOXPLOTS HERE

The boxplot of sales reveals there are *** outliers.

PASTE DESCRIPTIVES TABLE HERE

The mean weekly advertising figure is £ ***.

The median weekly advertising figure is £ ***.

The most spent on advertising in a week was £***.

The least spent on advertising in a week was £***.

PASTE HISTOGRAM HERE

“Here we see a histogram of our weekly advertising expenditure. We can see that roughly from £*** to £*** is spent on advertising each week. The average amount of weekly advertising is approximately £***. The amount spent in advertising varies over a range of £***.’’

WE SHALL NOW CONSIDER THE RELATIONSHIP

BETWEEN SALES AND ADVERTISING

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 5

Page 6: Session 4 LJ(1)

Session 4 Learning Journal question

PASTE SALES V ADVERTISING SCATTERPLOT HERE

THINK ABOUT WHICH OF THE TWO VARIABLES IS THE DEPENDENT VARIABLE AND THUS SHOULD BE PLOTTED ON THE Y AXIS!!!

From the above we can see the following.

There appears to be a negative/no/a positive correlation between sales and advertising.

The relationship between sales and advertising is linear/curved.

The variability of sales is/is not constant over advertising.

Thus it appears that simple linear regression is/is not appropriate for this data.

PASTE CORRELATION TABLE HERE

The Pearson correlation coefficient value of *** confirms what was apparent from the graph; there appears to be a very weak/weak/moderate/strong/very strong positive/negative correlation between the two variables.

There is/is not a significant correlation between sales and advertising (r=***, N=***, p=***).

PASTE MODEL SUMMARY OUTPUT HERE

From the above we can see that the model fits the data reasonably well; ***% of the variation in the sales values can be explained by the fitted line together with the advertising values. Conversely we have roughly a ***% of the variation not explained by the linear regression.

The standard deviation of sales around their expected values is ***.

PASTE MODEL COEFFICIENTS OUTPUT HERE

From the above we can see the following.

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 6

Page 7: Session 4 LJ(1)

Session 4 Learning Journal question

The intercept is ***. This is/is not significantly different to zero (p=***).

The gradient is ***. This is/is not significantly different to zero (p=***).

The expected sales value is given by:

Sales = *** + *** × advertising

Thus we can see that for each £1 increase in advertising, the sales value is expected to increase by £***. The 95% confidence interval for this expected increase is £*** to £***.

The intercept for this example could be interpreted as the sales value (***) when there is no advertising. However this is extrapolation and thus cannot be relied upon!

PASTE MODEL CASE DIAGNOSTICS OUTPUT HERE

From the above we can see that the ***th weekly observation does not fit the model too well. This week the model predicted sales of £*** but we experienced £*** less/more.

PASTE HISTOGRAM OF STD REDISUALS HERE

The fitted normal curve does/does not match the observed residuals well. Thus the normality assumption does/does not seem reasonable.

PASTE Q-Q PLOT HERE

The plotted points do/do not follow the straight line fairly well. Thus the normality assumption is/is not met

PASTE STD RESIDUAL V STD PREDICTED VALUE PLOT HERE

From the above we can/cannot see a relationship between the residuals and the predicted values. Thus the fitted model is/is not consistent with the assumption of linearity.

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 7

Page 8: Session 4 LJ(1)

Session 4 Learning Journal question

PASTE FITTED LINE WITH 95% CI PLOT HERE

The above plot gives us a visual idea of the predicted sales for various advertising expenditure. We can see that as you approach the extreme advertising values, the 95% confidence interval gets narrower/wider, indicating that the accuracy of our expected prediction is less/more.

Suppose you have been asked to predict the weekly sales for advertising expenditures of £1500 and £1800. Add these values at the bottom of the data set. Rerun the regression saving predicted values and 95% confidence interval.

PASTE SCREENSHOT OF DATA VIEW WITH PREDICTION + CI HERE

State the predictions:

For an advertising expenditure of £1500, the predicted sales is £*** with a 95% confidence interval from £*** to £***.

For an advertising expenditure of £1800, the predicted sales is £*** with a 95% confidence interval from £*** to £***.

Predictions from this model should be good as the R2 value is high (***%). However, predictions from extrapolation outside of the observed advertising expenditure data range (minimum *** to maximum=***) cannot be trusted. Thus we have reservations about the prediction for an advertising expenditure of £1500/£1800.

SAVE THE COMPLETED LEARNING JOURNAL QUESTION

AS THIS WILL FORM PART OF YOUR LEARNING JOURNAL SUBMISSION

Methods of Enquiry

Business Statistics Activity Leader: Dr Iain Weir ([email protected]) 8