Alcohol consumption and HDI story TotalBeerWineSpiritsOtherHDI Lifetime span...
-
Upload
dwayne-daniels -
Category
Documents
-
view
220 -
download
0
Transcript of Alcohol consumption and HDI story TotalBeerWineSpiritsOtherHDI Lifetime span...
Alcohol consumption and HDI story
Total Beer Wine Spirits Other HDI Lifetime span
Austria 13,24 6,7 4,1 1,6 0,4 0,755 80,119
Finland 12,52 4,59 2,24 2,82 0,31 0,800 79,724
Poland 13,25 4,72 3,26 1,56 0 0,715 75,976
Russia 15,76 3,65 0,1 6,88 0,34 0,644 67,260
Uganda 11,93 0,51 0 0,18 14,52 0,453 53,261
The Human Development Index (HDI) is a composite statistic of life expectancy, education, and income
What is a CORRELATION
Correlation – statistical procedure to measure & describe the relationship between two variable
Do two variables covary?
Are two variables dependent or independent of one another?
Can one variable be predicted from another?
What is a CORRELATION
World is full of COVARY
The IQ and brain sizeIQ Z IQ Pixel countZ PC CP
138 1,2947 991 2,122 2,74793140 1,37264 856 0,046 0,06333
96 -0,3421 879 0,4 -0,1367983 -0,8487 865 0,185 -0,15664
101 -0,1472 808 -0,69 0,10189135 1,17779 791 -0,95 -1,1231
85 -0,7708 799 -0,83 0,6401377 -1,0825 794 -0,91 0,9823188 -0,6538 894 0,631 -0,4123
Mean 104,78 853 2,70678SD 25,66 65,0192
n= 9 r= 0,33835r= 0,33835
Pearson's product-moment coefficient
.0 to .2 No relationship to very weak association.2 to .4 Weak association.4 to .6 Moderate association.6 to .8 Strong association.8 to 1.0 Very strong to perfect association
Interpretation
CAUTION!!!
Test the null
Testing H0
𝑡=𝑟 √ 𝑛−21−𝑟 2
Alcohol consumption and HDI story
total beer wine spirits other HDI Lifetimetotal 1,00 0,76 0,60 0,61 0,18 0,55 0,34beer 0,76 1,00 0,46 0,44 -0,13 0,63 0,46wine 0,60 0,46 1,00 0,16 -0,12 0,51 0,46spirits 0,61 0,44 0,16 1,00 -0,17 0,48 0,38other 0,18 -0,13 -0,12 -0,17 1,00 -0,25 -0,37HDI 0,55 0,63 0,51 0,48 -0,25 1,00 0,84Lifetime 0,34 0,46 0,46 0,38 -0,37 0,84 1,00
Correlation and causation
B causes A (reverse causation)The more firemen fighting a fire, the bigger the fire is observed to be.Therefore firemen cause an increase in the size of a fire.
A causes B and B causes A (bidirectional causation)Increased pressure is associated with increased temperature.Therefore pressure causes temperature.
Third factor C (the common-causal variable) causes both A and B)Sleeping with one's shoes on is strongly correlated with waking up with a headache.Therefore, sleeping with one's shoes on causes headache.
Illogically inferring causation from correlation
CoincidenceWith a decrease in the wearing of hats, there has been an increase in global warming over the same period.Therefore, global warming is caused by people abandoning the practice of wearing hats.
Church of the Flying Spaghetti Monster
Alcohol consumption and HDI story
total beer wine spirits other HDI Lifetimetotal 1,00 0,76 0,60 0,61 0,18 0,55 0,34beer 0,76 1,00 0,46 0,44 -0,13 0,63 0,46wine 0,60 0,46 1,00 0,16 -0,12 0,51 0,46spirits 0,61 0,44 0,16 1,00 -0,17 0,48 0,38other 0,18 -0,13 -0,12 -0,17 1,00 -0,25 -0,37HDI 0,55 0,63 0,51 0,48 -0,25 1,00 0,84Lifetime 0,34 0,46 0,46 0,38 -0,37 0,84 1,00
ScatterplotScatter plot of spousal ages, r = 0.97
Scatter plot of Grip Strength and Arm Strength, r = 0.63
Farnsworth favorite game
Anscombe’s quartetI II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.012.74
8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.012.50
12.010.84
12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Property Value
Mean of x in each case 9
Variance of x in each case 11
Mean of y in each case 7.50
Variance of y in each case
4.122 or 4.127
Correlation between x and y in each case
0.816
Anscombe’s quartetI II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.012.74
8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.012.50
12.010.84
12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Property Value
Mean of x in each case 9
Variance of x in each case 11
Mean of y in each case 7.50
Variance of y in each case
4.122 or 4.127
Correlation between x and y in each case
0.816
CAUTION!!!
Check scatterplot
Anscombe’s quartet
Problems
AGE
8070605040302010
SE
10
8
6
4
2
0
AGE
40302010
SE
9
8
7
6
5
4
3
2
1
Problems: Outliers
r=0,63 r=0,23
Problems: Range restriction
Coefficient of Determination (r2)
CoD = The proportion of variance or change in one variable that can be accounted for by another variable.
Problems: Range restriction
REGRESSION MODELS
Multiple linear regression (MLR) is a multivariate statistical technique for examining the linear correlations between two or more independent variables (IVs) and a single dependent variable (DV).
MLR
MLR
Poverty prediction
Poverty prediction
Name of regionPopulation change in 10 years.No. of persons employed in agriculturePercent of families below poverty levelResidential and farm property tax ratePercent residences with telephonesPercent rural populationMedian ageNumber of African/Americans
Level of measurementIVs: MLR involves two or more continuous (interval or ratio) or nominal variables (require recoding into dummy variables)DV: One continuous (interval or ratio) variable
Sample sizeTotal N based on ratio of cases to IVs:
Min. 5 cases per predictor (5:1)Ideally 20 cases per predictor (20:1)
LinearityAre the bivariate relationships linear?Check scatterplots and correlations between the DV (Y) and each of the IVs (Xs)Check for influence of bivariate outlier
MulticollinearityIs there multicollinearity between the IVs? (i.e., are they
overly correlated e.g., above .7?)Homoscedasticity
The variance of the error is constant across observations.Check scatterplots between Y and each of Xs and/or check scatterplot of the residuals (ZRESID) and predicted values (ZPRED)
MLR: Pre-analysis assumptions
MLR: Dummy coding for nominal data
Scatterplot: POP_CHNG vs. PT_POOR (Casewise MD deletion)
PT_POOR = 26,186 - ,4037 * POP_CHNG
Correlation: r = -,6491
-20 -10 0 10 20 30 40 50
POP_CHNG
10
15
20
25
30
35
40
45
PT
_PO
OR
95% confidence
MLR: Main Idea
Scatterplot: POP_CHNG vs. PT_POOR (Casewise MD deletion)
PT_POOR = 26,186 - ,4037 * POP_CHNG
Correlation: r = -,6491
-20 -10 0 10 20 30 40 50
POP_CHNG
10
15
20
25
30
35
40
45
PT
_PO
OR
95% confidence
MLR: Main Idea
Poverty prediction
3D Surface Plot of PT_POOR against POP_CHNG and PT_RURAL
POVERTY.STA 8v*30c
PT_POOR = 16,6681-0,3979*x+0,1339*y
> 40 < 40 < 30 < 20 < 10 < 0
-20-10
010
2030
4050
POP_CHNG
020
4060
80100
120
PT_RURAL
0
5
10
15
20
25
30
35
40
45
PT_POOR
Poverty prediction
MLR: Post-analysis assumptions
Multivariate outliersCheck whether there are influential multivariate outlying cases using Mahalanobis' Distance (MD) & Cook’s D (CD).
Normality of residualsResiduals are more likely to be normally distributed if each of the variables normally distributedCheck histograms of all variables in an analysisNormally distributed variables will enhance the MLR solution
Scatterplot: POP_CHNG vs. PT_POOR (Casewise MD deletion)
PT_POOR = 26,186 - ,4037 * POP_CHNG
Correlation: r = -,6491
-20 -10 0 10 20 30 40 50
POP_CHNG
10
15
20
25
30
35
40
45
PT
_PO
OR
95% confidence
MLR: Post-analysis assumptions
Distribution of Raw residuals
Expected Normal
-8 -6 -4 -2 0 2 4 60
1
2
3
4
5
6
7
8
9
10
No
of o
bs
Raw residuals vs. PT_RURAL
Raw residuals = -,2E-6 + 0,0000 * PT_RURAL
Correlation: r = ,58E-7
0 20 40 60 80 100 120
PT_RURAL
-6
-4
-2
0
2
4
6
Raw
res
idua
ls95% confidence
Poverty prediction
MLR: Types of MLR
Direct (or Standard) •All IVs are entered simultaneously
Hierarchical•IVs are entered in steps, i.e., some before others•Interpret R2 change
Forward •The software enters IVs one by one until there are no more significant IVs to be entered
Backward •The software removes IVs one to one until there are no more non-significant IVs to removed
Stepwise •A combination of Forward and Backward MLR
MLR: TOTAL
1. Conceptualise the model 2. Recode predictors (if necessary)3. Check assumptions4. Choose the type of MLR5. Interpret statistical output and meaning of results. 6. Depict the relationships in a path diagram or Venn
diagram 7. Regression equation: If relevant and useful, interpret Y-
intercept and write a regression equation for predicting Y