Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

30
Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10

Transcript of Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Page 1: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Statistical Guideline of Nature

Ji-Qian FangSchool of Public HealthSun Yat-Sen University

2008.10

Page 2: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

• An editorial of Nature Medicine  (2005) <Statistically significant> :

“Some of the articles published in Nature and Nature Medicine were criticized due to the deficiency in statistical issues”.

Challenge to Nature Medicine

Page 3: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

What happened?• Emili García-Berthou and Carles Alcaraz (Girona Uni

v., Spain) published an article in BMC Medical Research Methodology (May 2004).

They reviewed 181 research papers of Nature (2001) , found that 38% of them have at least one mistake in statistics.

• Since then, a series of critical articles have been published, of which one written by Robert Matthews (The Financial Times) analyzed the statistical methodology of the articles in Nature Medicine (2000).They found that 31% of the authors had misunderstood the meaning of P-value, even some one reported the P-value with unnecessary precision ( 0.002387).

Page 4: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Independent statistical “audit”

• Nature Medicine invited two experts from the University of Columbia to work out “statistical audit” , especially to evaluate 21 articles published in 2003 with a list of consolidated criteria on statistics.

• They found that some papers almost did not have any quantitative analysis, and some contained very complicated statistical and mathematical issues. While most of them have just used a litter statistical testing, but with incomplete descriptions such that one could hardly assess whether they were appropriate or not.

Page 5: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.
Page 6: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Checklist of statistical adequacy

Page 7: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

1.Reported n at start of study and for each analysi

s

2.Provided sample size calculation or justification

Examples

We believed that . . . the incidence of symptomatic

deep venous thrombosis or pulmonary embolism or

death would be 4% in the placebo group and 1.5%

in the ardeparin sodium group. Based on 0.9 power

to detect a significant difference (P=0.05, two-side

d), 976 patients were required for each study group.

To compensate for non-evaluable patients, we plan

ned to enroll 1000 patients per group

Page 8: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

• To have an 85% chance of detecting as

significant (at the two sided 5% level) a five point

difference between the two groups in the mean SF-

36 general health perception scores, with an

assumed standard deviation of 20 and a loss to

follow up of 20%, 360 women (720 in total) in each

group were required.

Page 9: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

3. Identified all statistical methods unambiguously

4. If statistical methods were described adequately, were any of them clearly inappropriate?

Example

All data analysis was carried out according to a

pre established analysis plan. Proportions were

compared by tests with continuity correction or

Fisher’s exact test when appropriate. Mean serum

retinol concentrations were compared by t test. . .

Two sided significance tests were used throughout.

2

Page 10: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

• Multivariate analyses were conducted

with logistic regression. The durations of episodes

and signs of disease were compared by using

proportional hazards regression.

Page 11: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Methods for additional analyses, such as subgroupanalyses and adjusted analyses:

ExampleProportions of patients responding were compared between treatment groups with the Mantel-Haenszelchi squared test, adjusted for the stratification variable,

methotrexate ( 氨甲叶酸 ) use.• . . . it was planned to assess the relative benefit ofCHART in an exploratory manner in subgroups: age, sex, performance status, stage, site, and histology. To test for differences in the effect of CHART, achi squared test for interaction was performed, or whenappropriate a chi-squared test for trend (131).

Page 12: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

5. Provided alpha for all statistical tests

6. Specified whether tests were one-sided or two-sided

7. Stated whether the data met the assumptions of the test

8. Reported actual P values for primary analyses

Page 13: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

ExampleThe data of two samples were adequately normally

distributed ( Shapiro-Wilk test : P1=0.466 ; P2 =0.482 ) and the two population variances were equal at the significant level 0.10 ( F = 1.345 ; P=0.261 ) ,

so two independent samples t test was used ( t=4.137 ; df=18 ; P=0.001 ) . The results indicated a statistically significant difference between effects of two drugs at two-tailed significant level 0.05 and the average increase of concentration of Hb was higher in patients taking the new drug, which could also be observed from the 95% confidence interval of the difference of two population means (3.829, 11.731).

Page 14: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

9. Were the statistical measures (mean, standard

error, standard deviation, etc.) reported, and

were they clearly labeled?

Example

The results show that the mean ± SD of IL-2 for

the experimental group (n=31) was 16.00IU/ml±

7.50 IU/ml and for the control group (n=30) was

20.00IU/ml±8.00 IU/ml; the difference between

the two group means was 4.00IU/ml, and the 95%

CI of the difference was ( 0.0304, 7.9696) ( IU/mL)

Page 15: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

10. Was the unit of analysis clearly stated in all

comparisons?

11. Are mean and standard deviation used to

describe data sets that may be non-normally

distributed or when the sample size is very small?

Group n Age (year) pH PaCO2 PaO2 SaO2

Experiment 12 63.00±15 7.36±0.17 63.00±15 9.25±1.91 85.12±5.99

Control 10 62.50±12.49 7.38±0.19 63.00±13.69 9.16±1.96 86.45±7.11

Results of Blood Gas Analysis ( 血气分析 )

What are the problems?

XX S

Page 16: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

12. Explanation of unusual or complex statistical

methods

ExampleIn order to compare the effects of common feed, feed with

plasma protein and feed with bioprotein on weight

growing to weaning young pigs , 30 weaning young pigs

were matched to 10 blocks by gender, days of age and

baseline weight. Then 3 individuals in each block were

randomly assigned to 1 of 3 treatment groups. After 10

days, the changes in weights from baseline were measured.

---- Random block design

Page 17: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

The mean change of weight SD was 3.33kg0.48kg

for the group of common feed, 3.83kg 0.61kg for that of

plasma protein, and 4.10kg 0.68kg for that of bioprotein. Results of two-way ANOVA under the significance

level of 0.05 indicated statistically significant differences

among 3 treatment groups (F=6.8112, P=0.0063). Similar

results were found among 10 blocks (F=2.7407, P=0.0328).

---- Results of ANOVA

Page 18: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

13. Explanation of data exclusions, if any

Example• The primary analysis was intention-to-treat andinvolved all patients who were randomly assigned

• One patient in the alendronate group was lost to

follow up; thus data from 31 patients were

available for the intention-to-treat analysis. Five

patients were considered protocol violators . . .

Consequently, 26 patients remained for the per-

protocol analyses

Page 19: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Protocol deviations

• Authors should report all departures from the protocol, including unplanned changes to interventions, examinations, data collection, and methods of analysis.

• The nature of the protocol deviation and the exact reason for excluding participants after randomization should always be reported.

Page 20: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

14. Explained reasons for any discrepancy

between initial n and n for each analysis

Example

Initially, the 60 rats were randomly divided into 3

groups, 15 for each, to receive 3 levels of doses

respectively. However, at the end of the first

week, 2 rats in the group of low dose escaped; on

the 40-th day, 1 rat in the group of high dose and

1 in the control group escaped …

Page 21: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

15. Explained method of treatment assignment (randomization, if any)

ExampleDetermination of whether a patient would be treated by

Streptomycin( 链霉素 )and bed-rest (S case) or by bed-restalone (C case) was made by reference to a statistical seriesbased on random sampling numbers drawn up for eachsex at each centre by Prof. Bradford Hill; the details of the series were unknown to any of the investigators or to the coordinator and were contained in a set of sealedenvelopes, each bearing on the outside only the name of the hospital and a number. After acceptance of a patientby the panel, the envelope was opened at the central office;the card inside told the medical officer of the centre if thepatient was to be an S or a C case.

Page 22: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

16. Explained any data transformation

Example

18 patients with acute encephalitis B ( 乙型脑炎 ) in a cli

nic were randomly allocated into 3 groups. Each

group accepted different kind of treatments, say

treatment A, B and C; and the fevering days were measu

red as the effects of treatments.

2min 13.4667S

Page 23: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

• Consider the two assumptions of one-way ANOVA. The fevering days are positively skew from the normal d

istribution; and the ratio of is closed to 10, the assumption of homogeneity of variances is also abandoned. Therefore, a square root transformation of the scale for the fevering days is applied…

• The new scales have been used in computation of one-way ANOVA. It resulted in that there is no significant difference on the average fevering days (scales of square roots) among the three kinds of treatments.

2min

2max and SS

Page 24: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

F处理

17. Discussed adjustments for multiple testing

ExampleMultiple comparison with Bonferroni adjustment

(alpha level of 0.0167) revealed that the effects of the two

treatments with protein were significantly higher than

that of common feed, while the difference between the

two treatments with protein was not statistically

significant.

----Multiple comparison

Page 25: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

For graphs

18. Were effect sizes distorted? (by truncation of y

axis, etc.)

What are the problem?

20

30

40

50

北京 天津 河北 山西 内蒙古

三甲

医院

数(

家)

Nu

mb

er of hosp

itals

Nu

mb

er of hosp

itals

北京 天津 河北 山西 内蒙

北京 天津 河北 山西 内蒙

Page 26: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

19. Were error bars unlabeled?

20. Were error bars absent?

•What is the height for?

•What are the bars for?

•What are the stars for?

Ch

olesterol (mg /d

L)

Ch

olesterol (mg /d

L)

Normal Patient

Normal Patient

Page 27: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

SummaryThree errors are particularly common

• Multiple comparisons: When making multiple

statistical comparisons on a single data set,

authors should explain how they adjusted the

alpha level to avoid an inflated Type I error rate,

or they should select statistical tests appropriate

for multiple groups (such as ANOVA rather than

a series of t-tests).

Page 28: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

• Normal distribution: Many statistical tests

require that the data be approximately normally

distributed; when using these tests, authors

should explain how they tested their data for

normality. If the data do not meet the assumptions

of the test, then a non-parametric alternative

should be used instead.

Page 29: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Small sample size: When the sample size is small

(less than about 10), authors should use tests

appropriate to small samples or justify their use

of large-sample tests.

Page 30: Statistical Guideline of Nature Ji-Qian Fang School of Public Health Sun Yat-Sen University 2008.10.

Thanks