How to Publish a Paper: Statistics and Writing Tips Kristin Sainani, PhD, Stanford University.

37
How to Publish a Paper: Statistics and Writing Tips Kristin Sainani, PhD, Stanford University

Transcript of How to Publish a Paper: Statistics and Writing Tips Kristin Sainani, PhD, Stanford University.

How to Publish a Paper: Statistics and Writing Tips

Kristin Sainani, PhD, Stanford University

Outline

1. Making a Data Analysis Plan 2. Analyzing Your Data 3. Creating Tables and Figures 4. Writing Up Your Findings

1. Make a data analysis plan before analyzing the data.

Why make a plan? Makes your analysis easier and more efficient. Ensures completeness. Keeps your analysis focused on the main

hypotheses of interest. Helps you avoid “p-value shopping” (type I

errors!) Makes it easier to write a clear statistical

methods section.

The data analysis plan—explanatory aims

What goes in your plan? (explanatory goal) How will I clean and check my data? How will I deal with missing data? How will I visualize my data? How will I define my primary exposures and outcomes? What descriptive analyses will best inform the primary hypothesis? How will I test my primary hypothesis? How will I visualize my primary hypothesis? How will I approach multivariate model building? What interactions/subgroup analyses will I be testing? Will I be performing any sensitivity analyses? Will I be performing any secondary/exploratory analyses?

The data analysis plan—predictive aims

What goes in your plan? (predictive goal) How will I clean and check my data? How will I deal with missing data? How will I visualize my data? How will I reduce my set of candidate variables? How will I select variables for the final prediction

model? How will I assess model performance? How will I validate my final model? How will I address clinical utility?

2. Tips for analyzing your data

Follow your statistical analysis plan. Document all your steps.

Aim for reproducible research! Use common sense. Draw lots of pictures. Focus on effect sizes, not p-values. If it’s a controlled study, focus on between-group

comparisons. Calculate absolute risks, not just relative risks. Ask for help from a statistician.

First rules of statistics…

n Use common sense!n Draw lots of pictures!

What’s wrong with this? Study with sample size of 10 (N=10) Results: “Objective scoring by blinded

investigators indicated that the treatment resulted in improvement in all (100%) of the subjects. Of patients showing overall improvement, 78% were graded as having either excellent or moderate improvement.”

Take-home message?

Reproduced with permission from: JAMA. 2010;303(12):1173-1179. doi:10.1001/jama.2010.312

Do the three groups differ meaningfully in weight change over time?

Table 2. Outcome volume for the experimental and standard groups; mean (SD).

Location Week 0 Week 12 Change (Week 0 – Week 12)

experimental standard experimental standard experimental standard

Affected side 3135 (748)* 3333 (1368)* 2982 (715)* 3331 (1383)* –154 (168) –2 (306)

Contralateral side

2595 (672) 2654 (761) 2553 (606) 2631 (736) –42 (193) –23 (219)

* p< .05 greater than the contralateral side

What is the only comparison that matters here?

Omit extraneous p-values

For controlled studies, focus on between-group comparisons

From the abstract: Design: A double-blinded, randomized,

controlled trial. Results: The treatment group showed

significant increases in walking speed and stride length after the intervention but showed no significant changes in peak hip extension or anterior pelvic tilt during comfortable and fast-paced walking. The treatment group also showed significantly increased passive hip extension range of motion.

But the comparison that matters in an RCT is treatment vs. control! This isn’t reported in the paper.

Table 1. Comfortable walking primary parameters (mean ± SD) pre- and postintervention

Parameter

Control (n = 41) Treatment (n = 33)

Pre- Post- P Value Pre- Post- P Value

Cadence, steps/min

108.1 ± 9.5

107.6 ± 10.9

.70106.5 ± 9.4

108.9 ± 9.0

.04

Walking speed, m/s

1.10 ± 0.2 1.10 ± 0.2 .83 1.15 ± 0.2 1.20 ± 0.2 .02

Stride length, m

1.22 ± 0.2 1.22 ± 0.2 .99 1.30 ± 0.2 1.32 ± 0.2 .05

Peak hip extension, degree

8.0 ± 8.4 8.8 ± 8.4 .24 10.9 ± 7.3 10.5 ± 6.9 .67

Peak anterior pelvic tilt, degree

11.6 ± 5.7 10.8 ± 5.8 .28 10.1 ± 6.1 10.5 ± 5.6 .63

Passive peak hip extension, degree

20.5 ± 8.6 22.3 ± 8.6 .10 19.9 ± 8.6 25.2 ± 7.7 <.001

Situations where you may need a statistician:

Correlated data Repeated-measures Two hands/two sides of the face/two knees/two

legs from the same person 1:1 matched cases and controls

Sparse data Multiple hypotheses Multivariate models

3. Tips for Tables and Figures

Data presentation matters!

Recommended order for writing an original manuscript

1. Tables and Figures 2. Results 3. Methods 4. Introduction 5. Discussion 6. Abstract

Figures and tables should stand alone and tell a complete story.

Each figure and table should have a take-home point.

The reader should not need to refer back to the main text.

Tables and Figures are the foundation of your story!

Tables and Figures are the story!

“An article about computational science in a scientific publication isn’t the scholarship itself, it’s merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.”—Jon Claerbout, Stanford

Tips on Tables and Figures

• Use the fewest figures and tables needed to tell the story.

• Do not present the same data in both a figure and a table.

Tables vs. Figures Figures

Visual impact Show trends and patterns Tell a quick story Tell the whole story Highlight a particular result

Tables Give precise values Display many values/variables

Example table:Table 1. Descriptive characteristics of the study groups, means ± SD or N (%).

Characteristic Bad Witches Good Witches

N 13 12

Age (yrs) 45 ± 5 36 ± 6*

Female 11 (85%) 10 (83%)

BMI (kg/m2) 21 ± 6 23 ± 3

Systolic BP (mmHg) 140 ± 10 120 ± 9*

Exercise (min/day) 30 ± 20 60 ± 30*

Employment status

Unemployed 4 (31%) 0 (0%)

Part time 3 (23%) 4 (33%)

Full time 6 (46%) 8 (66%)

Smoker (yes/no) 6 (50%) 0 (0%)*

*p<.05, ttest or Fisher’s exact test, as appropriate.

Three horizontal lines

Example table:Table 1. Descriptive characteristics of the study groups, means ± SD or N (%).

Characteristic Bad Witches Good Witches

N 13 12

Age (yrs) 45 ± 5 36 ± 6*

Female 11 (85%) 10 (83%)

BMI (kg/m2) 21 ± 6 23 ± 3

Systolic BP (mmHg) 140 ± 10 120 ± 9*

Exercise (min/day) 30 ± 20 60 ± 30*

Employment status

Unemployed 4 (31%) 0 (0%)

Part time 3 (23%) 4 (33%)

Full time 6 (46%) 8 (66%)

Smoker (yes/no) 6 (50%) 0 (0%)*

*p<.05, ttest or Fisher’s exact test, as appropriate.

What not to do!Table 1. Descriptive characteristics of the study groups, means ± SD or N (%).

Characteristic Bad Witches Good Witches

N 13 12

Age (yrs) 45 ± 5 36 ± 6*

Female 11 (85%) 10 (83%)

BMI (kg/m2) 21 ± 6 23 ± 3

Systolic BP (mmHg) 140 ± 10 120 ± 9*

Exercise (min/day) 30 ± 20 60 ± 30*

Employment status

Unemployed 4 (31%) 0 (0%)

Part time 3 (23%) 4 (33%)

Full time 6 (46%) 8 (66%)

Smoker (yes/no) 6 (50%) 0 (0%)*

*p<.05, ttest or Fisher’s exact test, as appropriate.

Remove grid lines!

What not to do!Table 1. Descriptive characteristics of the study groups, means ± SD or N (%).

Characteristic Bad Witches Good Witches

N 13 12

age (yrs) 45 ± 5 36 ± 6*

female 11 (85%) 10 (83%)

BMI (kg/m2) 21 ± 6 23 ± 3

Systolic BP (mmHg) 140 ± 10 120 ± 9*

Exercise (min/day) 30 ± 20 60 ± 30*

Employment status

Unemployed 4 (31%) 0 (0%)

Part time 3 (23%) 4 (33%)

Full time 6 (46%) 8 (66%)

Smoker (yes/no) 6 (50%) 0 (0%)*

*p<.05, ttest or Fisher’s exact test, as appropriate.

Make sure everything lines up and looks professional!

What not to do!Table 1. Descriptive characteristics of the study groups, means ± SD or N (%).

Characteristic Bad Witches Good Witches

N 13 12

Age (yrs) 45.076 ± 5.032 36.007 ± 6.032*

Female 11 (85%) 10 (83%)

BMI (kg/m2) 21.223 ± 6.332 23.331 ± 3.333

Systolic BP (mmHg) 140.23 ± 10.23 120.23 ± 9.23*

Exercise (min/day) 30.244 ± 20.345 60.123 ± 30.32*

Employment status

Unemployed 4 (31%) 0 (0%)

Part time 3 (23%) 4 (33%)

Full time 6 (46%) 8 (66%)

Smoker (yes/no) 6 (50%) 0 (0%)*

*p<.05, ttest or Fisher’s exact test, as appropriate.

Use a reasonable number of significant figures.

What not to do!Table 1. Descriptive characteristics of the study groups, means ± SD or N (%).

Characteristic Bad Witches Good Witches

N 13 12

age 45 ± 5 36 ± 6*

female 11 (85%) 10 (83%)

BMI 21 ± 6 23 ± 3

Systolic BP 140 ± 10 120 ± 9*

Exercise 30 ± 20 60 ± 30*

Employment status

Unemployed 4 (31%) 0 (0%)

Part time 3 (23%) 4 (33%)

Full time 6 (46%) 8 (66%)

Smoking 6 (50%) 0 (0%)*

*p<.05, ttest or Fisher’s exact test, as appropriate.

Give units!

What not to do!Table 1. Descriptive characteristics overall and by group, means ± SD or N (%), and p-values for the comparison between the groups.

Characteristic Overall Bad Witches Good Witches P-value

N 25 13 12 n/a

Age (yrs) 41 ± 6 45 ± 5 36 ± 6 0.0005

Female 21 (84%) 11 (85%) 10 (83%) 0.80

BMI (kg/m2) 22 ± 5 21 ± 6 23 ± 3 0.31

Systolic BP (mmHg)

131 ± 12 140 ± 10 120 ± 9 0.0001

Exercise (min/d) 45 ± 40 30 ± 20 60 ± 30 0.0069

Employment status

Unemployed 4 (16%) 4 (31%) 0 (0%) 0.17

Part time 7 (28%) 3 (23%) 4 (33%)

Full time 14 (56%) 6 (46%) 8 (66%)

Smoker (yes/no) 6 (24%) 6 (50%) 0 (0%) 0.01

Omit unnecessary columns!

Types of Figures1. Primary evidence

• electron micrographs, gels, photographs, pathology slides, X-rays, etc.

• indicates data quality• “Seeing is believing”

2. Graphs• line graphs, bar graphs, scatter plots, histograms,

boxplots, etc.

3. Drawings and diagrams• illustrate an experimental set-up or work-flow• indicate flow of participants• illustrate cause and effect relationships or cycles• give a hypothetical model • represent microscopic particles or microorganisms as

cartoons

Types of graphs:• line graphs

• scatter plots

• bar graphs

• individual-value bar graphs

• histograms

• box plots

• survival curves

4. Writing up your findings Tips for writing the Results section Tips for writing in general!

Results ≠ Raw Data The results section should:

Summarize what the data show Point out simple relationships Describe big-picture trends Cite figures or tables that present supporting

data Avoid simply repeating the numbers that

are already available in tables and figures.

Hypothetical Example

The characteristics of the bad witches and the good witches are shown in Table 1. There was a significant difference in age between the groups. The mean age of the bad witches was 45 ± 5; and the mean age of the good witches was 36 ± 6. There was no significant difference in gender between the groups, with the bad witches having 85% females and the good witches having 83% females. BMI was not significantly different between the groups, which both had normal BMIs. Systolic blood pressure and exercise were significantly different. The bad witches had a mean blood pressure of 140 ± 10, whereas the good witches had a mean blood pressure of 120 ± 9. Estimated daily exercise was higher in the good witches (60 ± 30) than the bad witches (30 ± 20). Employment was not significantly different between the two groups…

Edited version…Original:The characteristics of the bad witches and the good witches are shown in Table 1. There was a significant difference in age between the groups. The mean age of the bad witches was 45 ± 5; and the mean age of the good witches was 36 ± 6. There was no significant difference in gender between the groups, with the bad witches having 85% females and the good witches having 83% females. BMI was not significantly different between the groups, which both had normal BMIs. Systolic blood pressure and exercise were significantly different. The bad witches had a mean blood pressure of 140 ± 10, whereas the good witches had a mean blood pressure of 120 ± 9. Estimated daily exercise was higher in the good witches (60 ± 30) than the bad witches (30 ± 20). Employment was not significantly different between the two groups…

Revised:

The witches were, on average, lean and predominantly female (Table 1). Bad witches were significantly older, had higher blood pressures, exercised less, and were more likely to smoke than good witches. More bad witches were unemployed, but this difference did not reach statistical significance.

What verb tense do I use?*Use past tense for completed actions:

We found that…Women were more likely to…Men smoked more cigarettes than…The average reaction time was…

*Use the present tense for assertions that continue to be true, such as what the tables show, what you believe, and what the data suggest:Figure 1 shows…The findings confirm… The data suggest…We believe that this shows…

Say what you mean clearly and succinctly!

Example:

“This paper provides a review of the basic tenets of cancer biology study design, using as examples studies that illustrate the methodologic challenges or that demonstrate successful solutions to the difficulties inherent in biological research.”

s

and

“This paper reviews cancer biology study design, using examples that illustrate specific challenges and solutions.”

Say what you mean clearly and succinctly!

Example:

“As it is well known, increased athletic activity has been related to a profile of lower cardiovascular risk, lower blood pressure levels, and improved muscular and cardio-respiratory performance.”

I

fitness.

“Increased athletic activity is associated with lower cardiovascular risk, lower blood pressure, and improved fitness.”

“Increased athletic activity lowers cardiovascular risk and blood pressure, and improves fitness.” (stronger level of evidence)

is associated with

Good references on manuscript writing

Clinical Chemistry Guide to Scientific Writing: http://www.aacc.org/publications/clin_chem/ccgsw/Pages/default.aspx#

Mimi Zeiger. Essentials of Writing Biomedical Research Papers, McGraw Hill Professional

For more training on writing or statistics, see my MOOCs:

My Massive Open Online Courses (MOOCs):Writing in the Sciences:https://class.stanford.edu/courses/Medicine/Sci-Write/Fall2014/aboutStatistics in Medicine:https://class.stanford.edu/courses/Medicine/MedStats/Summer2014/about