Defining and Evaluating ‘Study Quality’

51
Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014

description

Defining and Evaluating ‘Study Quality’. Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014. Study Quality Matters?. YES!. Building theory (or a house) Studies = 2x4s, bricks, etc. Self-evident? Rarely discussed in linguistics research - PowerPoint PPT Presentation

Transcript of Defining and Evaluating ‘Study Quality’

Page 1: Defining and Evaluating  ‘Study Quality’

Defining and Evaluating ‘Study Quality’

Luke PlonskyCurrent Developments in

Quantitative Research MethodsLOT Winter School, 2014

Page 2: Defining and Evaluating  ‘Study Quality’

Study Quality Matters?Building theory (or a house)

Studies = 2x4s, bricks, etc.Self-evident?

Rarely discussed in linguistics researchBut lack of attention to quality ≠ low quality

Implication: Study quality needs to be examined, not assumed

YES!

Page 3: Defining and Evaluating  ‘Study Quality’

Defining ‘Study Quality’How was SQ defined in Plonsky & Gass (2011) and

Plonsky (2013)?How was SQ operationalized?Do you agree with this definition &

operationalization?Now consider your (sub-)domain of interest

How would you operationalize SQ?How would you weight or prioritize different

features?

Page 4: Defining and Evaluating  ‘Study Quality’

Missing data

Page 5: Defining and Evaluating  ‘Study Quality’

Data Type Primary Secondary / Meta-AnalyticSDs - Sample variability - Calculate ESs (d)

exclusion?Reliability - Small effects due to

treatment or dependent measure?- Inform instrument design

- Adjust ESs for attenuation

Effect sizes

- Interpret magnitude of effects- Future power analyses

- Compare/combine results- Power for moderator analysis

LIMITED INFLUENCE ON L2 THEORY, PRACTICE, AND

FUTURE RESEARCH INEFFICIENT

Page 6: Defining and Evaluating  ‘Study Quality’

Sources/Considerations for an Instrument to Measure Study Quality?

1. (Over 400) Existing measures of study quality from the meta-analysis literature (usually for weighting Ess) (e.g., sample: Valentine & Cooper, 2008—Table 2)

2. Societal guidelines (e.g., APA, APS, sample: JARS Working Group—Table 1, 2008, AERA 2006 reporting standards, LSA??, AAAL/AILA??)

3. Journal guidelines (e.g., Chapelle & Duff, 2003)

4. Methodological syntheses from other social sciences (e.g., Skidmore & Thompson, 2010)

5. Previous reviews / meta-analyses (e.g., Chaudron, 2001; Norris & Ortega, 2000; Plonsky, 2011)

6. Methods/stats textbooks (Larson-Hall, 2010; Porte, 2010)

7. Others?

Page 7: Defining and Evaluating  ‘Study Quality’

(Only) Two studies in this area to address study quality empirically

Plonsky & Gass (2011); Plonsky (2013, in press)

Rationale & MotivationsStudy quality needs to be measured, not assumedConcerns expressed about research and reporting

practices“Respect for the field of SLA can come only

through sound scientific progress” (Gass, Fleck, Leder, & Svetics, 1998)

No previous reviews of this nature

Page 8: Defining and Evaluating  ‘Study Quality’

Plonsky & Gass (2011) & Plonsky (2013)

Two common goals: 1. Describe and evaluate quantitative research

practices2. Inform future research practices

Page 9: Defining and Evaluating  ‘Study Quality’

Methods(very meta-analytic but focus on methods rather than substance/effects/outcomes)

Plonsky & Gass (2011) Domain: Interactionist L2

research; quantitative onlyAcross 16 journals & 2 books

(all published, 1980-2009)K = 174Coded for: designs, analyses,

reporting practicesAnalyses: frequencies/%s

Plonsky (2013)Domain: all areas of L2

research; quantitative onlyTwo journals: LL & SSLA (all

published, 1990-2010)K = 606Coded for: designs, analyses,

reporting practices (sample scheme)

Analyses: frequencies/%sHow would you define your domain? Where would

you search for primary studies?

Page 10: Defining and Evaluating  ‘Study Quality’

RESULTS

Page 11: Defining and Evaluating  ‘Study Quality’

Results: Designs

Major Designs across Research SettingsPlonsky (2013) P&G

(2011)Design Class Lab (all)Observational 20% 80% 65Experimental 45% 55% 35

Page 12: Defining and Evaluating  ‘Study Quality’

Results: Designs

SamplesStudy Average

nTotal N Groups

P&G (2011) 22 7,951 365Plonsky (2013)

19 181,255 1,732

Page 13: Defining and Evaluating  ‘Study Quality’

Results: Designs

Plonsky (2013) P&GFeature Class Lab AllRandom assign.

23% 48% 32%

Ctrl/Comp group

90% 84% 55%

Pretest 78% 59% 39%Delayed posttest

50% 29% 79%

Page 14: Defining and Evaluating  ‘Study Quality’

Results: AnalysesAnalysis P&G (2011)

%P (2013)

%ANOVA 54 56t test 69 43Correlation 18 31Chi-square 50 19Regression 8 15MANOVA 7 7ANCOVA 7 5Factor analysis 2 5SEM - 2Other - 7Nonparametrics

- 5

Page 15: Defining and Evaluating  ‘Study Quality’

Results: Analyses

P&G (2011) %

P (2013) %

Zero 6 12One 32 28Multiple 62 60

Number of Unique Statistical Analyses Used in L2 Research

M35

SD64

95% CIs

30-40

Median18

Tests of Statistical Significance in L2 ResearchPlonsky (2013)

Page 16: Defining and Evaluating  ‘Study Quality’

Results: Descriptive Statistics

Item P&G (2011) %

P (2013) %

Percentage 62 68Frequency 71 48Correlation - 30Mean 64 77Standard deviation

52 60

Mean without SD - 31Effect size 18 26Confidence interval

3 5

Plonsky (2013)

Page 17: Defining and Evaluating  ‘Study Quality’

Results: Inferential StatisticsItem P&G (2011)

%P (2013)

%F 26 61t 32 36x2 - 17p = 44 49p < or > 61 80p either = or </> - 44p = and p < or > - 42ANOVA / t test without M - 20ANOVA / t test without SD - 35ANOVA / t test without f or t - 24

Page 18: Defining and Evaluating  ‘Study Quality’

Results: Other Reporting Practices

Item P&G (2011) %

P (2013) %

RQs or hypotheses - 80Visual displays of data

- 53

Reliability 64 45Pre-set alpha 25 22Assumptions checked 3 17Power analysis 2 1

Plonsky (2013)

?

Page 19: Defining and Evaluating  ‘Study Quality’

Studies excluded due to missing data (usually SDs) (as % of meta-analyzed sample)

Li (2010)Pennock-Roman & Rivera (2011)

Bowles (2010)Kieffer et al. (2009)

Abraham (2008)Goldschnieder & DeKeyser (2003)

Norris & Ortega (2000)Plonsky (2011)

Wa-Mbaleka (2006)Lin et al. (2013)

Grgurović et al. (2013)Biber et al. (2011)

Russell & Spada (2006)Nekrasova & Becker (2009)

Dinsmore (2006)Wu (1991)

0 40 80 120 160 200 240 280

6212527

364249

595960

81104110

119194

300

MEDIAN

Median K = 16 (Plonsky & Oswald, under review)

Page 20: Defining and Evaluating  ‘Study Quality’

Data missing in meta-analyzed studies (as % of total sample)

Nekrasovia & Becker (2009)

Norris & Ortega (2000)

Plonsky & Gass (2011)

Plonsky (in press)

Keck et al. (2006)

Wang (2010)

Lee & Huang (2008)

0 10 20 30 40 50 60 70 80 90100

14

18

20

29

24

12

35

29

6

17

24

49

19

Missing test statMissing SDMissing M

Page 21: Defining and Evaluating  ‘Study Quality’

Reporting of reliability coefficients (as % of meta-analyzed sample)

Nekrasova & Becker (2009)

Mackey & Goo (2007)

Norris & Ortega (2000)

Russell & Spada (2006)

Jeon & Kaya (2006)

Plonsky (2011)

Ziegler (2013)

Plonsky (in press)

Adesope et al. (2010)

Adesope et al. (2011)

Plonsky & Gass (2011)

0 25 50 75 100

67

1620

3841434546

5064

Page 22: Defining and Evaluating  ‘Study Quality’

Keck et al. (2006)

Norris & Ortega (2000)

Wang (2010)

Mackey & Goo (2007)

Plonsky & Gass (2011)

Plonsky (in press)

Ziegler (2013)

0 10 20 30 40 50 60 70 80 90 100

0

6

11

14

18

26

36

0

3

0

3

5

0

CIES

Reporting of effect sizes & CIs (as % of meta-analyzed sample)

Page 23: Defining and Evaluating  ‘Study Quality’

Other data associated with quality/transparency and recommended or required by APA (as % of meta-analyzed sample)

Plonsky & Gass (2011)

Plonsky (in press)

Mackey & Goo (2007)

Nekrasovia & Becker (2009)

Norris & Ortega (2000)

Ziegler (2013)

Keck et al. (2006)

0 10 20 30 40 50 60 70 80 90 100

25

22

25

20

29

38

3

17

36

2

1

7

8053

34

56

PowerAssumptionsPre-set alphaVisualsRQs

Page 24: Defining and Evaluating  ‘Study Quality’

Elsewhere in the social sciences…Kesselman et al. (1998-a)

Kesselman et al. (1998-c)

Kesselman et al. (1998-b)

Kieffer et al. (2001-a)

Kesselman et al. (1998-d)Bangert & Baumberger

(2005)Thompson & Snyder (1997)

Kieffer et al. (2001-b)

Vacha-Haase et al. (1999)

Willson (1980)Sedlmeier & Gigerenzer

(1989)Cashen & Geiger (2004)

0 10 20 30 40 50 60 70 80 90 100

7

7

10

18

24

43

64

72

61

15

9

33

3

0

0

1

0.003

0

7

0

0.003

11

39

36

37

PowerAssumptionsReliabilityCIsESs

Page 25: Defining and Evaluating  ‘Study Quality’

Results: Changes over timeMeara (1995): “[When I was in graduate school], anyone who could explain the difference between a one-tailed and two-tailed test of significance was regarded as a dangerous intellectual; admitting to a knowledge of one-way analyses of variance was practically the same as admitting to witchcraft in 18th century Massachusetts” (p. 341).

Page 26: Defining and Evaluating  ‘Study Quality’

Changes Over Time: DesignsPlonsky & Gass (2011) Plonsky (in press)

Page 27: Defining and Evaluating  ‘Study Quality’

Changes Over Time: DesignsPlonsky (in press)

Page 28: Defining and Evaluating  ‘Study Quality’

Changes Over Time: AnalysesPlonsky & Gass (2011)

Page 29: Defining and Evaluating  ‘Study Quality’

Changes Over Time: AnalysesPlonsky (in press)

Page 30: Defining and Evaluating  ‘Study Quality’

Changes Over Time: Reporting Practices

Plonsky & Gass (2011)

Page 31: Defining and Evaluating  ‘Study Quality’

Changes Over Time: Reporting Practices

Plonsky (in press)

Page 32: Defining and Evaluating  ‘Study Quality’

Relationship between quality and outcomes?

Plonsky (2011)

Plonsky & Gass (2011): larger effects for studies that include delayed posttests

Page 33: Defining and Evaluating  ‘Study Quality’

Discussion (Or: So what?)General:Few strengths and numerous methodological weaknesses are

present—common even—in quantitative L2 researchQuality (and certainly methodological features) vary across

subdomains AND over time.Possible relationship between methodological practices and the

outcomes they produce.Three common themes:

Means-based analyses Missing data, NHST, and the ‘Power Problem’ Design Preferences

Page 34: Defining and Evaluating  ‘Study Quality’

Discussion: Means-based analyses

ANOVAs, t tests dominate, increasinglyNot problematic as long as

Assumptions checked (17% of Plonsky, 2013)Data are reported thoroughlyTest are most appropriate for RQs (i.e., not default)

Benefits to increased regression analyses (see Cohen, 1968)Less categorization of continuous variables (e.g., proficiency,

working memory) to use ANOVA loss of variance!More precise results (R2s +informative than an overall p or

eta2)Fewer tests preservation of experiment-wise power

Page 35: Defining and Evaluating  ‘Study Quality’

Discussion: Missing data, NHST, & PowerIn general: lots of missing and inconsistently reported data!

BUT We’re getting better!The “Power Problem”

Small samples Heavy reliance on NHSTEffects not generally very largeOmission of non-statistical results inflated summary resultsRarely check assumptionsRarely use multivariate statisticsRarely analyze power

Page 36: Defining and Evaluating  ‘Study Quality’

Discussion: Design PreferencesSigns of domain maturity?

+classroom-based studies +experimental studies+delayed posttests

Page 37: Defining and Evaluating  ‘Study Quality’

Discussion-SummaryCauses/explanations

- Inconsistencies among reviewers- Lack of standards- Lack of familiarity with design and appropriate data analysis and reporting- Inadequate training (Lazaraton et al., 1987)- Non-synthetic-mindedness- Publication bias

Effects• Limited

interpretability• Limited meta-

analyzability• Overestimation of

effects• Overreliance on p

values

S l o w e r P r o g r e s s

Page 38: Defining and Evaluating  ‘Study Quality’

Study Quality in Secondary/Meta-analytic Research?

Page 39: Defining and Evaluating  ‘Study Quality’

IntroM-As = high visibility and impact on theory and practice

quality is criticalSeveral instruments proposed for assessing M-A quality

Stroup et al. (2000)Shea et al. (2007) JARS/MARS (APA, 2008)Plonsky (2012)

Page 40: Defining and Evaluating  ‘Study Quality’

Plonsky’s (2012) Instrument for Assessing M-A Quality

Goal 1: Assess transparency and thoroughness as a means to Clearly delineate the domain under investigation Enable replication Evaluate the appropriateness of the methods in

addressing/answering the study’s RQs

Goal 2: Set a tentative, field-specific standard Inform meta-analysts and reviewers/editors of M-As

Organization: Lit review/intro Methods Discussion

What items would you include?

Page 41: Defining and Evaluating  ‘Study Quality’

Plonsky’s (2012) Instrument for Assessing M-A Quality—Section I

Com

bine

?

Page 42: Defining and Evaluating  ‘Study Quality’

Plonsky’s (2012) Instrument for Assessing M-A Quality—Section II

Page 43: Defining and Evaluating  ‘Study Quality’

Plonsky’s (2012) Instrument for Assessing M-A Quality—Section III

Page 44: Defining and Evaluating  ‘Study Quality’

Looking FORWARDRecommendations for:- Individual researchers- Journal editors- Meta-researchers- Researcher trainers- Learned societies

Page 45: Defining and Evaluating  ‘Study Quality’

Consider power before AND after a study (but especially before)p is overrated (meaningless?) especially when working with (a)

small samples, (b) large samples, (c) small effects, (d) large effectsReport and interpret data thoroughly (EFFECT SIZES!)Consider regression and multivariate analysesCalculate and report instrument reliabilityTeam up with an experimental (or observational) researcherDevelop expertise in one or more novel (to you) methods/analyses

Love,Luke

Dear individual researchers,

Page 46: Defining and Evaluating  ‘Study Quality’

Dear journal editors,Use your influence to improve rigor, transparency, and

consistency It’s not enough to require reporting (of…ES, SDs, reliability

etc.) – interpretation too! Develop field-wide and field-specific standards Include special methodological reviews (see Magnan, 1994)Devote (precious) journal space to methodological discussions

and reports

Love,

Luke

Page 47: Defining and Evaluating  ‘Study Quality’

Dear meta-researchers, Use your voice! Guide interpretations of effect sizes in your domains Evaluate and make knows methodological strengths, weaknesses,

and gaps; encourage effective practices and expose weak ones Don’t just summarize Explain variability in effects, not just means (e.g., due to small

samples, heterogeneous samples or treatments) Examine substantive and methodological changes over time and as

they related to outcomes Cast the net wide in searching for primary studies

Love,

Luke

Page 48: Defining and Evaluating  ‘Study Quality’

Dear researcher trainers,Lots of emphasis on the basics: descriptive statistics,

sample size+power+effect size+p; synthetic approach, ANOVA

Encourage more specialized courses, in other departments if necessary

Love,

Luke

Page 49: Defining and Evaluating  ‘Study Quality’

Dear learned societies (AILA/AAAL, LSA, etc.),

To Learned Societies (AILA, AAAL, LSA, etc.)Designate a task force or committee to establish field-

specific standards for research and reporting practices:(a) at least one member of the executive committee, (b) members from the editorial boards of relevant

journals, (c) a few quantitatively- and qualitatively-minded

researchers, (d) and one or more methodologists in other disciplines

Love,Luke

Page 50: Defining and Evaluating  ‘Study Quality’

ClosureContent objectives: conceptual and practical (but

mostly conceptual) Inform participants’ current and future research

effortsMotivate future inquiry with a methodological focus

Happy to consult or collaborate on projects related to these discussions

Page 51: Defining and Evaluating  ‘Study Quality’

THANK YOU!