RStats Statistics and Research Camp 2014
description
Transcript of RStats Statistics and Research Camp 2014
Slide 1
RStats Statistics and Research Camp 2014
Welcome!
Slide 2
Helen Reid, PhD
Dean of the College of Health and Human Services
Missouri State University
Slide 3
Welcome
• Goal: keeping up with advances in quantitative methods
• Best practices: use familiar tools in new situations– Avoid common mistakes
Todd Daniel, PhDRStats Institute
Slide 4
Coffee Advice
Ronald A. Fisher
Slide 5
Cambridge, England 1920
Dr. Muriel Bristol
Box,
197
6
Slide 6
Familiar Tools
• Null Hypothesis Significance Testing (NHST)
• a.k.a. Null Hypothesis Decision Making
• a.k.a. Statistical Hypothesis Inference Testing
p < .05The probability of finding these
results given that the null hypothesis is true
Slide 7
Benefits of NHST
• All students trained in NHST• Easier to engage researchers• Results are not complex
Everybody is doing it
Slide 8
Statistically Significant Difference
What does p < .05 really mean?1. There is a 95% chance that
the alternative hypothesis is true
2. This finding will replicate 95% of the time
3. If the study was repeated, the null would be rejected 95% of the time
Slide 9
The Earth is Round, p < .05
What you want…• The probability that an hypothesis
is true given the evidenceWhat you get…• The probability of the evidence
assuming that the (null) hypothesis is true.
Cohen, 1994
Slide 10
Pigs Might Fly
Slide 11
• Null: There are no flying pigsH0: P = 0
• Random sample of 30 pigs • One can fly 1/30 = .033• What kind of test?– Chi-Square?– Fisher exact test?– Binomial?
Why do you even need a test?
Slide 12
Daryl Bem and ESP
• Assumed random guessing p = .50• Found subject success of 53%, p
< .05• Too much power?– Everything counts in large amounts
• What if Bem set p = 0 ?– One clairvoyant v. group that guesses
53%In the real world, the null is always false.
Slide 13
Problems with NHST
• Cohen (1994) reservations• Non-replicable findings• Poor basis for policy decisions• False sense of confidence
Cohe
n, 1
994NHST is “a potent but sterile intellectual
rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring”
- Paul Meehl
Slide 14
What to do then?
• Learn basic methods that improve your research (learn NHST)
• Learn advanced techniques and apply them to your research (RStats Camp)
• Make professional connections and access professional resources
Slide 15
Agenda
9:30 Best Practices (and Poor Practices) In Data Analysis
11:00 Moderated Regression 12:00 – 12:45 Lunch (with Faculty Writing Retreat)
1:00 Effect Size and Power Analysis 2:00 Meta-Analysis 3:00 Structural Equation Modeling
Slide 16
RStats Statistics and Research Camp 2014
Best Practices and Poor Practices
Session 1R. Paul Thomlinson PhD
Burrell
Slide 17
Poor PracticesCommon Mistakes
Slide 18
No BlueprintMistake #1
Slide 19
Ignoring Assumptions
Pre-Checking Data Before Analysis
Mistake #2
Slide 20
Assumptions Matter
• Data: I call you and you don’t answer.
• Conclusion: you are mad at me. • Assumption:
you had your phone with you.
If my assumptions are wrong, it prevents me from looking at the
world accurately
Slide 21
Assumptions for Parametric Tests
• "Assumptions behind models are rarely articulated, let alone defended. The problem is exacerbated because journals tend to favor a mild degree of novelty in statistical procedures. Modeling, the search for significance, the preference for novelty, and the lack of interest in assumptions -- these norms are likely to generate a flood of nonreproducible results."
– David Freedman, Chance 2008, v. 21 No 1, p. 60
Slide 22
Assumptions for Parametric Tests
• "... all models are limited by the validity of the assumptions on which they ride."Collier, Sekhon, and Stark, Preface (p. xi) to Freedman David A., Statistical Models and Causal Inference: A Dialogue with the Social Sciences.
• Parametric tests based on the normal distribution assume:– Interval or Ratio Level Data– Independent Scores– Normal Distribution of the Population– Homogeneity of Variance
Slide 23
Assessing the Assumptions
• Assumption of Interval or Ratio Data– Look at your data to make sure you
are measuring using scale-level data– This is common and easily verified
Slide 24
Independence• Techniques are least likely to be robust to
departures from assumptions of independence.
• Sometimes a rough idea of whether or not model assumptions might fit can be obtained by either plotting the data or plotting residuals obtained from a tentative use of the model. Unfortunately, these methods are typically better at telling you when the model assumption does not fit than when it does.
Slide 25
Independence
• Assumption of Independent Scores– Done during research construction– Each individual in the sample should be
independent of the others• The errors in your model should not be
related to each other.• If this assumption is violated:– Confidence intervals and significance
tests will be invalid.
Slide 26
Assumption of Normality• You want your distribution
to not be skewed• You want your distribution
to not have kurtosis– At least, not too much of either
Slide 27
Normally Distributed Something or Other
• The normal distribution is relevant to:– Parameters– Confidence intervals around a
parameter– Null hypothesis significance testing
• This assumption tends to get incorrectly translated as ‘your data need to be normally distributed’.
Slide 28
Assumption of Normality• Both skew and kurtosis can be
measured with a simple test run for you in SPSS– Values exceeding +3 or -3 indicate
very skewed
Slide 29
Assessing Normality with Numbers
Slide 30
• Kolmogorov-Smirnov Test– Tests if data differ from a normal
distribution– Significant = non-Normal data– Non-Significant = Normal data
• Non-Significant is the ideal
Tests of Normality
SPSS
Exam
.sav
Slide 31
The P-P Plot
Normal Not Normal
Slide 32
Histograms & Stem-and Leaf Plots
Bi-Modal Normal-ish
Double-click on Histogram in Output window to add the normal curve
SPSS
Exam
.sav
Slide 33
When does the Assumption of Normality Matter?
• Normality matters most in small samples– The central limit theorem allows us to
forget about this assumption in larger samples.
• In practical terms, as long as your sample is fairly large, outliers are a much more pressing concern than normality
Slide 34
Assessing the Assumptions
• Assumption of Homogeneity of Variance– Only necessary when comparing
groups– Levene’s Test
Slide 35
Assessing Homogeneity of VarianceGraphs
Homogeneous Heterogeneous
Number of hours of ringing in ears after a concert
Slide 36
Assessing Homogeneity of VarianceNumbers
• Levene’s Tests– Tests if variances in different groups are
the same.– Significant = Variances not equal– Non-Significant = Variances are equal
• Non-Significant is ideal• Variance Ratio (VR)
– With 2 or more groups– VR = Largest variance/Smallest variance– If VR < 2, homogeneity can be assumed.
Slide 37
Spotting problems with Linearity or Homoscedasticity
Slide 38
Ignoring Missing Data
Mistake #3
Slide 39
Missing Data
It is the lion you don’t see that eats you
Missing Data
Slide 40
Amount of Missing Data• APA Task Force on Statistical Inference (1999)
recommended that researchers report patterns of missing data and the statistical techniques used to address the problems such data create
• Report as a percentage of complete data– “Missing data ranged from a low of 4% for
attachment anxiety to a high of 12% for depression.”
• If calculating total or scale scores, impute the values for the items first, then calculate scale
Slide 41
Pattern of Missing Data
• Missing Completely At Random (MCAR)– No pattern; not related to variables– Accidentally skipped one; got distracted
• Missing At Random (MAR)– Pattern does not differ between groups
• Not Missing At Random (NMAR)– Parents who feel competent are more likely to skip
the question about interest in parenting classes
Slide 42
Pattern of Missing DataDistinguish between MCAR and MAR
• Create a dummy variable with two values: missing and non-missing– SPSS: recode new variable
• Test the relation between dummy variable and the variables of interest– If not related: data are either MCAR or NMAR– If related: data are MAR or NMAR
• Little’s (1988) MCAR Test– Missing Values Analysis add-on module in SPSS 20– If the p value for this test is not significant, indicates
data are MCAR
Slide 43
What if my Data are NMAR?
• You’re not screwed• Report the pattern and amount of
missing data
Slide 44
Listwise Deletion• Cases with any missing values are
deleted from analysis– Default procedure for SPSS
• Problems– If the cases are not MCAR remaining cases
are a biased subsample of the total sample– Analysis will be biased– Loss of statistical power• Dataset of 302 respondents dropped to 154
cases
Deletion
Slide 45
Pairwise Deletion• Cases are excluded only if data are
missing on a required variable – Correlating five variables: case that was
missing data on one variable would still be used on the other four
• Problems– Uses different cases for each correlation
(n fluctuates)– Difficult to compare correlations– May mess with multivariate analyses
Deletion
Slide 46
Mean Substitution• Missing values are imputed with
the mean value of that variable• Problems– Produces biased means with data that
are MAR or NMAR– Underestimates variance and
correlations• Experts strongly advise against this
method
Imputation
Slide 47
Regression Substitution• Existing scores are used to predict
missing values• Problems – Produces unbiased means under
MCAR or MAR– Produces biases in the variances
• Experts advise against this method
Imputation
Slide 48
Pattern-Matching Imputation
• Hot-Deck Imputation– Values are imputed by finding participants who
match the case with missing data on other variables
• Cold-Deck Imputation– Information from external sources is used to
determine the matching variables• Does not require specialized programs • Has been used with survey data• Reduces the amount of variation in the
data
Imputation
Slide 49
Stochastic Imputation Methods
• Stochastic = random – Does not systematically change the mean;
gives unbiased variance estimates• Maximum Likelihood (ML) Strategies– Observed data are used to estimate
parameters, which are then used to estimate the missing scores
– Provides “unbiased and efficient” parameters– Useful for exploratory factor analysis and
internal consistency calculations
Stochastic Imputation
Slide 50
Multiple Imputation (MI)
• Create several imputed data sets (3 – 5)
• Analyze each data set and save the parameter estimates
• Average the parameter estimates to get an unbiased parameter estimate–Most complex procedure– Computer-intensive
Stochastic Imputation
Slide 51
Handling Missing Data• Read: Examine published literature
to find similar situations• Choose an appropriate method– Expectation maximization–Multiple imputation–Maximum likelihood
• Report the method chosen to handle the data and give a brief rationale for that selection
Slide 52
Ignoring Outliers
Mistake #4
Slide 53
Outliers
Extreme example
Outliers can change the nature of the relationship
Slide 54
Outliers
• Univariate outlier– “Outliers are people, too.”– Check for
• Multivariate outlier– Should be removed– Find with Mahalanobis test
Slide 55
Spotting Outliers With Graphs
Outlier
Mus
icFe
stiv
al.s
av
Slide 56
Before
AfterAfter
Slide 57
Ignoring Effect Size
Mistake #5
Slide 58
Ignoring Effect Size
• Effect size is the magnitude of the findings
• Post hoc: easy, non-controversial• A priori: used for statistical power
analysis• Power: probability of rejecting the
null when the null is false– The ability to find a difference where
one exists
Slide 59
More Significant?
• Imagine you are comparing two tests.The first test is significant z = 2.01, p
< .05, two tailThe second is significant z = 8.37, p
< .0001, two tail• Is the second more significant
than the first? –No, it is only a less likely result. We want to know how BIG the effect was
Slide 60
How does Significance Differ From Effect Size
You failed to record a 25¢ charge to your checking accountWas your 25¢ deficit due to random variation or was it
a real mistake. Real mistake
Will that mistake have a big effect? No. Real effect but a small effect
You recorded a $200 payment as a $200 depositWas your $400 deficit due to random variation or was it
a real mistake. Real mistake
Will that mistake have a big effect? Yes. Real effect and a large effect size
Slide 61
Effect Size
• How big was the effect the treatment had – Critical value does not tell you
effect size • Hypothesis testing tells if an
effect is significant– You should also report the effect
size– r– d
Slide 62
Cohen’s d Effect Size
r = .1 d = .2 small effectthe effect explains 1% of the total
variancer = .3 d = .5 medium
effectthe effect accounts for 9% of the total
variancer = .5 d = .8 large effectthe effect accounts for 25% of the
varianceFree effect size calculator at:http://www.missouristate.edu/rstats/
110161.htm
Slide 63
Making Continuous Categorical
Mean or Median Splits
Mistake #6
Slide 64
Making Continuous Categorical
Slide 65
Bad Idea—Don’t Do It
• Results in:– Lost information—why throw away all
those data??– Reduced statistical power– Increased likelihood of Type II error
• Only justified when:– Distribution of the variable is highly
skewed– The variable’s relationship to another
variable is non-linear.
Slide 66
Misunderstood Analysis
“But that’s how we have always done it.”
Mistake #7
Slide 67
MANOVA then ANOVA
• Study of 222 MANOVAs in six journals
• Common: MANOVA followed by ANOVAs
• MANOVA controls for Type I error• Protected F Test• “A significant MANOVA difference
need not imply that any significant ANOVA effect or effects exist…”
Huberty & Morris, 1989
Slide 68
When to Use ANOVA
• Outcome variables are conceptually independent– Effects of using clickers, teacher
interaction, and student ability on Algebra concept attainment, Geometry concept attainment, Musical concept attainment, and classroom interaction?
– Use four 3-Way ANOVAs
Slide 69
When to Use ANOVA
• Research is exploratory– Study of new treatment or outcome
variables– Non-confirmatory
• Reexamine bivariate relationships in multivariate context– Outcome variables were previously
studied in univariate contexts– Useful for comparisons
Slide 70
When to Use ANOVA
• Selecting a comparison group– Demonstrate that two or more groups
are similar on a number of descriptors• Problem• If both IV-1 and IV-2 are significant,
but IV-2 is highly correlated with IV-1, then IV-2 is not really contributing –MANOVA can control for this
Slide 71
When to Use MANOVA
• Are there any overall interactions or main effects present?
• Variable Selection– Do I need all these DVs?– Find the parsimonious DV combination
• Variable Ordering – Assess the relative contribution of DVs
to group differences• Variable System Structure
Slide 72
When to Use MANOVA
• Variable System Structure– Identify a construct that underlies the DVs
• More of an art than statistical science– System: collection of conceptually related
variables that underlies a construct– Five attitude DVs, reduced to 2 (Watterson, Joe, Cole, & Sells,
1980)
– Reduced 21 DVs on student performance to 2 constructs: academic performance and personal growth (Hackman & Taber, 1979)
Slide 73
So
• MONOVA and ANOVA address different research questions– One may have little bearing on the
other• Controlling for Type I errors with
preliminary MANOVA is a myth• Whether using MANOVA or multiple
ANOVAs, report the intercorrelation of the variables.
Slide 74
Confidence Intervals
Slide 75
Confidence Intervals
• When estimating the population mean, the best guess is the sample mean
• The sample mean is very precise, but it is unlikely to be 100% accurate–Any outcome has some
measurement error
Slide 76
Confidence Intervals• Another way to estimate the
population value is a Confidence Interval– The mean should be between this and that
• The confidence interval is not very specific but we are very confident that the real mean is contained within its range– The average movie ticket is $6.83– Tickets will probably cost between $5 and
$8
Slide 77
Confidence Intervals• Dugong, et al. (2008)– Plankton consumption by sharks at
National Aquarium• True Mean (all basking sharks)– 15 Million
• Sample Mean (sharks at National Aquarium)– 17 Million
• Confidence Interval estimate– 12 to 22 million (contains true value)– 16 to 18 million (misses true value)– CIs constructed such that 95% of the
CIs contain the true value.Basking Shark
Slide 78
FIGURE 2The confidence intervals of the number of plankton consumed by a basking shark at one time (horizontal axis) for 50 different samples (vertical avis)
plankton (in millions)
Slide 79
Moving Beyond NHSTNext Steps
Slide 80
The Four Parameters
1. Alpha significance criterion (p < .05)
2. The sample size3. The population effect size 4. The power of the test.
Any one is a function of the other three
Slide 81
1. Power
• Before conducting an study, you should do a power analysis
• Power is the probability of not making a Type II error– Power = 1 - B
• We find the effect when it is truly there–We want to maximize power
D. Wayne Mitchell PhD
Slide 82
Type I and Type II Errors• Type I Error– Occurs when we believe that there is
a genuine effect in our population, when in fact there isn’t.
– The probability is the α-level (usually .05)
• Type II Error– Occurs when we believe that there is
no effect in the population when, in reality, there is.
– The probability is the β-level (often .2)
Pinocchio
Dunce CapΨ
Slide 83
2. Effect Size
• A significant alpha tell us the results were (most likely) not accidental
• Effect size tells us whether the effect was large or small– Gas prices
• Effect size can be used in meta-analysis
Melissa Meier PhD
Slide 84
3. Complex Relationships
• NHST tells us that differences exist between groups
• Complex relationships can exist among variables
• Structural Equation Modeling
Kayla Jordan, RStats
Slide 85
4. Mediation and Moderation
• NHST tells us what differences exist• Mediation tells us how
relationships between variables change
• Moderation tells us when relationships between variables exist
Todd Daniel PhD
Slide 86
Take a Break