Treating Interval-Scaled Data as...

26
BRETT GRAYSON SOUTHERN METHODIST UNIVERSITY 2 MAY 2013 Treating Interval-Scaled Data as Nominal

Transcript of Treating Interval-Scaled Data as...

B R E T T G R A Y S O N

S O U T H E R N M E T H O D I S T U N I V E R S I T Y

2 M A Y 2 0 1 3

Treating Interval-Scaled Data as Nominal

Interval Scale vs. Nominal Scale

Nominal scales differentiate items by their names and/or their qualitative classifications

Examples: Dichotomous: Male/Female

Non-dichotomous: Ethnicity

Interval scales rank the data and show the differences between each data point

Examples: Thermometer readings

IQ Scores

Dollars and cents

Treating Interval-Scaled Data as Nominal

This report examines how the effect size of an interval-scaled dataset changes when the data is re-scaled.

Generally, data should be collected at the highest scale possible (Thompson, 2006).

While higher-scaled (interval) data can be converted to a lower-scale (nominal), some necessary information about the data will inevitably be lost.

When does re-scaling occur?

Test Scores

Interval Ordinal Nominal

Ashley 99 1st Group 1

Elizabeth 89.5 3rd Group 1

Evelyn 91 2nd Group 1

Miriam 87 4th Group 2

Sarah 79.4 6th Group 2

Thom 79.5 5th Group 2

While re-scaling does not change the data,

it does affect our interpretation of the data.

Organizing the datasets

We will examine three types of datasets, each n = 300.

R 2 ~ .7

R 2 ~ .4

R 2 ~ .15

Each dataset is evenly divided into 2, 3, 5, and 10 groups.

new.data<-new.data[order(new.data$iv),]

new.data$g2<-rep(1:2, each=150)

new.data$g3<-rep(1:3, each=100)

new.data$g5<-rep(1:5, each=60)

new.data$g10<-rep(1:10, each=30)

Interval-Scaled Data Regression

> summary(m1<-lm(dv~iv, new.data))

Residuals:

Min 1Q Median 3Q Max

-1.64470 -0.34441 0.02374 0.41289 1.54452

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.33400 3.17303 5.148 4.8e-07 ***

iv 0.83666 0.03173 26.369 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5486 on 298 degrees of freedom

Multiple R-squared: 0.7, Adjusted R-squared: 0.699

F-statistic: 695.3 on 1 and 298 DF, p-value: < 2.2e-16

Plotting the Interval-Scaled Data

plot(new.data$iv, new.data$dv)

abline(m1)

This plot shows a strong correlation indicative of the 0.7 multiple r-squared

Nominal-Scaled Data Regression

> summary(m2<-lm(dv~g2, new.data))

Residuals:

Min 1Q Median 3Q Max

-51.589 -9.741 -0.175 10.354 44.829

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 60.305 2.739 22.02 <2e-16 ***

g2 26.464 1.732 15.28 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15 on 298 degrees of freedom

Multiple R-squared: 0.4392, Adjusted R-squared: 0.4373

F-statistic: 233.4 on 1 and 298 DF, p-value: < 2.2e-16

Plotting the Nominal-Scaled Data

plot(new.data$g2, new.data$dv)

abline(m2)

This plot shows a how the dataset looks when split into two, resulting in a multiple r-squared of .44

2 Groups

Nominal (R2 ~ .44)Interval (R2 ~ .7)

3 Groups

Nominal (R2 ~ .56)Interval (R2 ~ .7)

5 Groups

Nominal (R2 ~ .63)Interval (R2 ~ .7)

10 Groups

Nominal (R2 ~ .66)Interval (R2 ~ .7)

2 Groups

Nominal (R2 ~ .26)Interval (R2 ~ .4)

3 Groups

Nominal (R2 ~ .30)Interval (R2 ~ .4)

5 Groups

Nominal (R2 ~ .34)Interval (R2 ~ .4)

10 Groups

Nominal (R2 ~ .37)Interval (R2 ~ .4)

2 Groups

Nominal (R2 ~ .08)Interval (R2 ~ .15)

3 Groups

Nominal (R2 ~ .11)Interval (R2 ~ .15)

5 Groups

Nominal (R2 ~ .12)Interval (R2 ~ .15)

10 Groups

Nominal (R2 ~ .14)Interval (R2 ~ .15)

Another common mistake: ANOVA

So far, we have observed the problem with under-scaling data. Data was re-scaled, then regression was run as if it were still interval.

There is another common mistake - Splitting the data into groups, factoring, and then running an ANOVA.

Data should be nominal when running an ANOVABut this data was just re-scaled to ordinal from interval.

The grouping variables were treated as factors and ran as an ANOVA. Let’s compare the effect sizes.

Factor, ANOVA, and eta-squared

Factor the groups

new.data$g2<-factor(new.data$g2)

new.data$g3<-factor(new.data$g3, rep(1:3))

new.data$g5<-factor(new.data$g5, rep(1:5))

new.data$g10<-factor(new.data$g10, rep(1:10))

Run the ANOVA

> anova(aov(dv~new.data$g2, new.data))

Analysis of Variance Table

Response: dv

Df Sum Sq Mean Sq F value Pr(>F)

new.data$g2 1 52524 52524 233.35 < 2.2e-16 ***

Residuals 298 67076 225

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Eta-squared: (52524)/(52524 + 67076) = 0.4391639

Comparison of effect sizes

IV G2 G3 G5 G10

R2 .7 .44 .56 .63 .66

η2 x .44 .56 .64 .69

R2 .4 .26 .30 .34 .37

η2 x .26 .30 .34 .39

R2 .15 .08 .11 .12 .14

η2 x .08 .11 .14 .16The IV is still interval-scaled, so no ANOVA needed.

Conclusions

Re-scaling a dataset has a reducing effect on the data.

For this reason, behavioral researchers try to avoid re-scaling.

In the R plots, we saw the decrease in effect size.

As the original dataset was split into more and more groups, the correlation increased and approach the correlation of the original data.

The table shows how running an ANOVA on the nominal data can be another mistake.

The eta-squared is also a reduction of the original data’s effect size.

References

Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach. New York, NY: The Guilford Press.