RStats Statistics and Research Camp 2014

RStats Statistics and Research Camp 2014

Welcome!

Helen Reid, PhD

Dean of the College of Health and Human Services

Missouri State University

Welcome

• Goal: keeping up with advances in quantitative methods

• Best practices: use familiar tools in new situations– Avoid common mistakes

Todd Daniel, PhDRStats Institute

Coffee Advice

Ronald A. Fisher

Cambridge, England 1920

Dr. Muriel Bristol

Box,

197

6

Familiar Tools

• Null Hypothesis Significance Testing (NHST)

• a.k.a. Null Hypothesis Decision Making

• a.k.a. Statistical Hypothesis Inference Testing

p < .05The probability of finding these

results given that the null hypothesis is true

Benefits of NHST

• All students trained in NHST• Easier to engage researchers• Results are not complex

Everybody is doing it

Statistically Significant Difference

What does p < .05 really mean?1. There is a 95% chance that

the alternative hypothesis is true

2. This finding will replicate 95% of the time

3. If the study was repeated, the null would be rejected 95% of the time

The Earth is Round, p < .05

What you want…• The probability that an hypothesis

is true given the evidenceWhat you get…• The probability of the evidence

assuming that the (null) hypothesis is true.

Cohen, 1994

Pigs Might Fly

• Null: There are no flying pigsH0: P = 0

• Random sample of 30 pigs • One can fly 1/30 = .033• What kind of test?– Chi-Square?– Fisher exact test?– Binomial?

Why do you even need a test?

Daryl Bem and ESP

• Assumed random guessing p = .50• Found subject success of 53%, p

< .05• Too much power?– Everything counts in large amounts

• What if Bem set p = 0 ?– One clairvoyant v. group that guesses

53%In the real world, the null is always false.

Problems with NHST

• Cohen (1994) reservations• Non-replicable findings• Poor basis for policy decisions• False sense of confidence

Cohe

n, 1

994NHST is “a potent but sterile intellectual

rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring”

- Paul Meehl

What to do then?

• Learn basic methods that improve your research (learn NHST)

• Learn advanced techniques and apply them to your research (RStats Camp)

• Make professional connections and access professional resources

Agenda

9:30 Best Practices (and Poor Practices) In Data Analysis

11:00 Moderated Regression 12:00 – 12:45 Lunch (with Faculty Writing Retreat)

1:00 Effect Size and Power Analysis 2:00 Meta-Analysis 3:00 Structural Equation Modeling

RStats Statistics and Research Camp 2014

Best Practices and Poor Practices

Session 1R. Paul Thomlinson PhD

Burrell

Poor PracticesCommon Mistakes

No BlueprintMistake #1

Ignoring Assumptions

Pre-Checking Data Before Analysis

Mistake #2

Assumptions Matter

• Data: I call you and you don’t answer.

• Conclusion: you are mad at me. • Assumption:

you had your phone with you.

If my assumptions are wrong, it prevents me from looking at the

world accurately

Assumptions for Parametric Tests

• "Assumptions behind models are rarely articulated, let alone defended. The problem is exacerbated because journals tend to favor a mild degree of novelty in statistical procedures. Modeling, the search for significance, the preference for novelty, and the lack of interest in assumptions -- these norms are likely to generate a flood of nonreproducible results."

– David Freedman, Chance 2008, v. 21 No 1, p. 60

Assumptions for Parametric Tests

• "... all models are limited by the validity of the assumptions on which they ride."Collier, Sekhon, and Stark, Preface (p. xi) to Freedman David A., Statistical Models and Causal Inference: A Dialogue with the Social Sciences.

• Parametric tests based on the normal distribution assume:– Interval or Ratio Level Data– Independent Scores– Normal Distribution of the Population– Homogeneity of Variance

Assessing the Assumptions

• Assumption of Interval or Ratio Data– Look at your data to make sure you

are measuring using scale-level data– This is common and easily verified

Independence• Techniques are least likely to be robust to

departures from assumptions of independence.

• Sometimes a rough idea of whether or not model assumptions might fit can be obtained by either plotting the data or plotting residuals obtained from a tentative use of the model. Unfortunately, these methods are typically better at telling you when the model assumption does not fit than when it does.

Independence

• Assumption of Independent Scores– Done during research construction– Each individual in the sample should be

independent of the others• The errors in your model should not be

related to each other.• If this assumption is violated:– Confidence intervals and significance

tests will be invalid.

Assumption of Normality• You want your distribution

to not be skewed• You want your distribution

to not have kurtosis– At least, not too much of either

Normally Distributed Something or Other

• The normal distribution is relevant to:– Parameters– Confidence intervals around a

parameter– Null hypothesis significance testing

• This assumption tends to get incorrectly translated as ‘your data need to be normally distributed’.

Assumption of Normality• Both skew and kurtosis can be

measured with a simple test run for you in SPSS– Values exceeding +3 or -3 indicate

very skewed

Assessing Normality with Numbers

• Kolmogorov-Smirnov Test– Tests if data differ from a normal

distribution– Significant = non-Normal data– Non-Significant = Normal data

• Non-Significant is the ideal

Tests of Normality

SPSS

Exam

.sav

The P-P Plot

Normal Not Normal

Histograms & Stem-and Leaf Plots

Bi-Modal Normal-ish

Double-click on Histogram in Output window to add the normal curve

SPSS

Exam

.sav

When does the Assumption of Normality Matter?

• Normality matters most in small samples– The central limit theorem allows us to

forget about this assumption in larger samples.

• In practical terms, as long as your sample is fairly large, outliers are a much more pressing concern than normality

Assessing the Assumptions

• Assumption of Homogeneity of Variance– Only necessary when comparing

groups– Levene’s Test

Assessing Homogeneity of VarianceGraphs

Homogeneous Heterogeneous

Number of hours of ringing in ears after a concert

Assessing Homogeneity of VarianceNumbers

• Levene’s Tests– Tests if variances in different groups are

the same.– Significant = Variances not equal– Non-Significant = Variances are equal

• Non-Significant is ideal• Variance Ratio (VR)

– With 2 or more groups– VR = Largest variance/Smallest variance– If VR < 2, homogeneity can be assumed.

Spotting problems with Linearity or Homoscedasticity

Ignoring Missing Data

Mistake #3

Missing Data

It is the lion you don’t see that eats you

Missing Data

Amount of Missing Data• APA Task Force on Statistical Inference (1999)

recommended that researchers report patterns of missing data and the statistical techniques used to address the problems such data create

• Report as a percentage of complete data– “Missing data ranged from a low of 4% for

attachment anxiety to a high of 12% for depression.”

• If calculating total or scale scores, impute the values for the items first, then calculate scale

Pattern of Missing Data

• Missing Completely At Random (MCAR)– No pattern; not related to variables– Accidentally skipped one; got distracted

• Missing At Random (MAR)– Pattern does not differ between groups

• Not Missing At Random (NMAR)– Parents who feel competent are more likely to skip

the question about interest in parenting classes

Pattern of Missing DataDistinguish between MCAR and MAR

• Create a dummy variable with two values: missing and non-missing– SPSS: recode new variable

• Test the relation between dummy variable and the variables of interest– If not related: data are either MCAR or NMAR– If related: data are MAR or NMAR

• Little’s (1988) MCAR Test– Missing Values Analysis add-on module in SPSS 20– If the p value for this test is not significant, indicates

data are MCAR

What if my Data are NMAR?

• You’re not screwed• Report the pattern and amount of

missing data

Listwise Deletion• Cases with any missing values are

deleted from analysis– Default procedure for SPSS

• Problems– If the cases are not MCAR remaining cases

are a biased subsample of the total sample– Analysis will be biased– Loss of statistical power• Dataset of 302 respondents dropped to 154

cases

Deletion

Pairwise Deletion• Cases are excluded only if data are

missing on a required variable – Correlating five variables: case that was

missing data on one variable would still be used on the other four

• Problems– Uses different cases for each correlation

(n fluctuates)– Difficult to compare correlations– May mess with multivariate analyses

Deletion

Mean Substitution• Missing values are imputed with

the mean value of that variable• Problems– Produces biased means with data that

are MAR or NMAR– Underestimates variance and

correlations• Experts strongly advise against this

method

Imputation

Regression Substitution• Existing scores are used to predict

missing values• Problems – Produces unbiased means under

MCAR or MAR– Produces biases in the variances

• Experts advise against this method

Imputation

Pattern-Matching Imputation

• Hot-Deck Imputation– Values are imputed by finding participants who

match the case with missing data on other variables

• Cold-Deck Imputation– Information from external sources is used to

determine the matching variables• Does not require specialized programs • Has been used with survey data• Reduces the amount of variation in the

data

Imputation

Stochastic Imputation Methods

• Stochastic = random – Does not systematically change the mean;

gives unbiased variance estimates• Maximum Likelihood (ML) Strategies– Observed data are used to estimate

parameters, which are then used to estimate the missing scores

– Provides “unbiased and efficient” parameters– Useful for exploratory factor analysis and

internal consistency calculations

Stochastic Imputation

Multiple Imputation (MI)

• Create several imputed data sets (3 – 5)

• Analyze each data set and save the parameter estimates

• Average the parameter estimates to get an unbiased parameter estimate–Most complex procedure– Computer-intensive

Stochastic Imputation

Handling Missing Data• Read: Examine published literature

to find similar situations• Choose an appropriate method– Expectation maximization–Multiple imputation–Maximum likelihood

• Report the method chosen to handle the data and give a brief rationale for that selection

Ignoring Outliers

Mistake #4

Outliers

Extreme example

Outliers can change the nature of the relationship

Outliers

• Univariate outlier– “Outliers are people, too.”– Check for

• Multivariate outlier– Should be removed– Find with Mahalanobis test

Spotting Outliers With Graphs

Outlier

Mus

icFe

stiv

al.s

av

Before

AfterAfter

Ignoring Effect Size

Mistake #5

Ignoring Effect Size

• Effect size is the magnitude of the findings

• Post hoc: easy, non-controversial• A priori: used for statistical power

analysis• Power: probability of rejecting the

null when the null is false– The ability to find a difference where

one exists

More Significant?

• Imagine you are comparing two tests.The first test is significant z = 2.01, p

< .05, two tailThe second is significant z = 8.37, p

< .0001, two tail• Is the second more significant

than the first? –No, it is only a less likely result. We want to know how BIG the effect was

How does Significance Differ From Effect Size

You failed to record a 25¢ charge to your checking accountWas your 25¢ deficit due to random variation or was it

a real mistake. Real mistake

Will that mistake have a big effect? No. Real effect but a small effect

You recorded a $200 payment as a $200 depositWas your $400 deficit due to random variation or was it

a real mistake. Real mistake

Will that mistake have a big effect? Yes. Real effect and a large effect size

Effect Size

• How big was the effect the treatment had – Critical value does not tell you

effect size • Hypothesis testing tells if an

effect is significant– You should also report the effect

size– r– d

Cohen’s d Effect Size

r = .1 d = .2 small effectthe effect explains 1% of the total

variancer = .3 d = .5 medium

effectthe effect accounts for 9% of the total

variancer = .5 d = .8 large effectthe effect accounts for 25% of the

varianceFree effect size calculator at:http://www.missouristate.edu/rstats/

110161.htm

Making Continuous Categorical

Mean or Median Splits

Mistake #6

Making Continuous Categorical

Bad Idea—Don’t Do It

• Results in:– Lost information—why throw away all

those data??– Reduced statistical power– Increased likelihood of Type II error

• Only justified when:– Distribution of the variable is highly

skewed– The variable’s relationship to another

variable is non-linear.

Misunderstood Analysis

“But that’s how we have always done it.”

Mistake #7

MANOVA then ANOVA

• Study of 222 MANOVAs in six journals

• Common: MANOVA followed by ANOVAs

• MANOVA controls for Type I error• Protected F Test• “A significant MANOVA difference

need not imply that any significant ANOVA effect or effects exist…”

Huberty & Morris, 1989

When to Use ANOVA

• Outcome variables are conceptually independent– Effects of using clickers, teacher

interaction, and student ability on Algebra concept attainment, Geometry concept attainment, Musical concept attainment, and classroom interaction?

– Use four 3-Way ANOVAs

When to Use ANOVA

• Research is exploratory– Study of new treatment or outcome

variables– Non-confirmatory

• Reexamine bivariate relationships in multivariate context– Outcome variables were previously

studied in univariate contexts– Useful for comparisons

When to Use ANOVA

• Selecting a comparison group– Demonstrate that two or more groups

are similar on a number of descriptors• Problem• If both IV-1 and IV-2 are significant,

but IV-2 is highly correlated with IV-1, then IV-2 is not really contributing –MANOVA can control for this

When to Use MANOVA

• Are there any overall interactions or main effects present?

• Variable Selection– Do I need all these DVs?– Find the parsimonious DV combination

• Variable Ordering – Assess the relative contribution of DVs

to group differences• Variable System Structure

When to Use MANOVA

• Variable System Structure– Identify a construct that underlies the DVs

• More of an art than statistical science– System: collection of conceptually related

variables that underlies a construct– Five attitude DVs, reduced to 2 (Watterson, Joe, Cole, & Sells,

1980)

– Reduced 21 DVs on student performance to 2 constructs: academic performance and personal growth (Hackman & Taber, 1979)

So

• MONOVA and ANOVA address different research questions– One may have little bearing on the

other• Controlling for Type I errors with

preliminary MANOVA is a myth• Whether using MANOVA or multiple

ANOVAs, report the intercorrelation of the variables.

Confidence Intervals

Confidence Intervals

• When estimating the population mean, the best guess is the sample mean

• The sample mean is very precise, but it is unlikely to be 100% accurate–Any outcome has some

measurement error

Confidence Intervals• Another way to estimate the

population value is a Confidence Interval– The mean should be between this and that

• The confidence interval is not very specific but we are very confident that the real mean is contained within its range– The average movie ticket is $6.83– Tickets will probably cost between $5 and

$8

Confidence Intervals• Dugong, et al. (2008)– Plankton consumption by sharks at

National Aquarium• True Mean (all basking sharks)– 15 Million

• Sample Mean (sharks at National Aquarium)– 17 Million

• Confidence Interval estimate– 12 to 22 million (contains true value)– 16 to 18 million (misses true value)– CIs constructed such that 95% of the

CIs contain the true value.Basking Shark

FIGURE 2The confidence intervals of the number of plankton consumed by a basking shark at one time (horizontal axis) for 50 different samples (vertical avis)

plankton (in millions)

Moving Beyond NHSTNext Steps

The Four Parameters

1. Alpha significance criterion (p < .05)

2. The sample size3. The population effect size 4. The power of the test.

Any one is a function of the other three

1. Power

• Before conducting an study, you should do a power analysis

• Power is the probability of not making a Type II error– Power = 1 - B

• We find the effect when it is truly there–We want to maximize power

D. Wayne Mitchell PhD

Type I and Type II Errors• Type I Error– Occurs when we believe that there is

a genuine effect in our population, when in fact there isn’t.

– The probability is the α-level (usually .05)

• Type II Error– Occurs when we believe that there is

no effect in the population when, in reality, there is.

– The probability is the β-level (often .2)

Pinocchio

Dunce CapΨ

2. Effect Size

• A significant alpha tell us the results were (most likely) not accidental

• Effect size tells us whether the effect was large or small– Gas prices

• Effect size can be used in meta-analysis

Melissa Meier PhD

3. Complex Relationships

• NHST tells us that differences exist between groups

• Complex relationships can exist among variables

• Structural Equation Modeling

Kayla Jordan, RStats

4. Mediation and Moderation

• NHST tells us what differences exist• Mediation tells us how

relationships between variables change

• Moderation tells us when relationships between variables exist

Todd Daniel PhD

Take a Break

RStats Statistics and Research Camp 2014

Documents

Transcript of RStats Statistics and Research Camp 2014