Treating Interval-Scaled Data as...
Transcript of Treating Interval-Scaled Data as...
B R E T T G R A Y S O N
S O U T H E R N M E T H O D I S T U N I V E R S I T Y
2 M A Y 2 0 1 3
Treating Interval-Scaled Data as Nominal
Interval Scale vs. Nominal Scale
Nominal scales differentiate items by their names and/or their qualitative classifications
Examples: Dichotomous: Male/Female
Non-dichotomous: Ethnicity
Interval scales rank the data and show the differences between each data point
Examples: Thermometer readings
IQ Scores
Dollars and cents
Treating Interval-Scaled Data as Nominal
This report examines how the effect size of an interval-scaled dataset changes when the data is re-scaled.
Generally, data should be collected at the highest scale possible (Thompson, 2006).
While higher-scaled (interval) data can be converted to a lower-scale (nominal), some necessary information about the data will inevitably be lost.
When does re-scaling occur?
Test Scores
Interval Ordinal Nominal
Ashley 99 1st Group 1
Elizabeth 89.5 3rd Group 1
Evelyn 91 2nd Group 1
Miriam 87 4th Group 2
Sarah 79.4 6th Group 2
Thom 79.5 5th Group 2
While re-scaling does not change the data,
it does affect our interpretation of the data.
Organizing the datasets
We will examine three types of datasets, each n = 300.
R 2 ~ .7
R 2 ~ .4
R 2 ~ .15
Each dataset is evenly divided into 2, 3, 5, and 10 groups.
new.data<-new.data[order(new.data$iv),]
new.data$g2<-rep(1:2, each=150)
new.data$g3<-rep(1:3, each=100)
new.data$g5<-rep(1:5, each=60)
new.data$g10<-rep(1:10, each=30)
Interval-Scaled Data Regression
> summary(m1<-lm(dv~iv, new.data))
Residuals:
Min 1Q Median 3Q Max
-1.64470 -0.34441 0.02374 0.41289 1.54452
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.33400 3.17303 5.148 4.8e-07 ***
iv 0.83666 0.03173 26.369 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5486 on 298 degrees of freedom
Multiple R-squared: 0.7, Adjusted R-squared: 0.699
F-statistic: 695.3 on 1 and 298 DF, p-value: < 2.2e-16
Plotting the Interval-Scaled Data
plot(new.data$iv, new.data$dv)
abline(m1)
This plot shows a strong correlation indicative of the 0.7 multiple r-squared
Nominal-Scaled Data Regression
> summary(m2<-lm(dv~g2, new.data))
Residuals:
Min 1Q Median 3Q Max
-51.589 -9.741 -0.175 10.354 44.829
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.305 2.739 22.02 <2e-16 ***
g2 26.464 1.732 15.28 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15 on 298 degrees of freedom
Multiple R-squared: 0.4392, Adjusted R-squared: 0.4373
F-statistic: 233.4 on 1 and 298 DF, p-value: < 2.2e-16
Plotting the Nominal-Scaled Data
plot(new.data$g2, new.data$dv)
abline(m2)
This plot shows a how the dataset looks when split into two, resulting in a multiple r-squared of .44
Another common mistake: ANOVA
So far, we have observed the problem with under-scaling data. Data was re-scaled, then regression was run as if it were still interval.
There is another common mistake - Splitting the data into groups, factoring, and then running an ANOVA.
Data should be nominal when running an ANOVABut this data was just re-scaled to ordinal from interval.
The grouping variables were treated as factors and ran as an ANOVA. Let’s compare the effect sizes.
Factor, ANOVA, and eta-squared
Factor the groups
new.data$g2<-factor(new.data$g2)
new.data$g3<-factor(new.data$g3, rep(1:3))
new.data$g5<-factor(new.data$g5, rep(1:5))
new.data$g10<-factor(new.data$g10, rep(1:10))
Run the ANOVA
> anova(aov(dv~new.data$g2, new.data))
Analysis of Variance Table
Response: dv
Df Sum Sq Mean Sq F value Pr(>F)
new.data$g2 1 52524 52524 233.35 < 2.2e-16 ***
Residuals 298 67076 225
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Eta-squared: (52524)/(52524 + 67076) = 0.4391639
Comparison of effect sizes
IV G2 G3 G5 G10
R2 .7 .44 .56 .63 .66
η2 x .44 .56 .64 .69
R2 .4 .26 .30 .34 .37
η2 x .26 .30 .34 .39
R2 .15 .08 .11 .12 .14
η2 x .08 .11 .14 .16The IV is still interval-scaled, so no ANOVA needed.
Conclusions
Re-scaling a dataset has a reducing effect on the data.
For this reason, behavioral researchers try to avoid re-scaling.
In the R plots, we saw the decrease in effect size.
As the original dataset was split into more and more groups, the correlation increased and approach the correlation of the original data.
The table shows how running an ANOVA on the nominal data can be another mistake.
The eta-squared is also a reduction of the original data’s effect size.