Explore, Analyze and Present your data

Post on 27-Jan-2015

111 views 1 download

Tags:

description

 

Transcript of Explore, Analyze and Present your data

your dataGuillaume Calmettes

presentanalyze

explore

“Bonjour”, I am Guillaume!

Sacre Bleu!

gcalmettes@mednet.ucla.eduOffice: MRL 3645

Bordeaux

Disclaimer

I am not a statistician

Statistics are scary

Statistics

(You at the beginning of the talk)

Statistics are scary

Statistics

not so

(You at the middle of the talk)

Statistics are scary

Statistics

cool

(You at the end of the talk)

Statistics are scary

Statistics

coolWe have to deal with them anyways, so we had better enjoy them!

(You at the end of the talk)

Press the t-test button and you’ll be done!

Did you check the normality of your data first?

Why should you care about statistics?

http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf

Why should you care about statistics?

Advances in Physiological Education

“Explorations in Statistics” series (2008-present) (Douglas Curran-Everett)

Why should you care about statistics?

http://jp.physoc.org/cgi/collection/stats_reporting

The Journal of Physiology Experimental Physiology The British Journal of Pharmacology Microcirculation The British Journal of Nutrition

“Statistical Perspectives” series (2011-present) (Gordon Drummond)

Why should you care about statistics?

http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html

Significance, P values and t-tests – November 2013 Introduction to the concept of statistical significance and the one-sample t-test.

Error Bars – October 2013 The use of error bars to represent uncertainty and advice on how to interpret them.

Importance of being uncertain – September 2013 How samples are used to estimate population statistics and what this means in terms of uncertainty.

Why should you care about statistics?

“Nature research journals will introduce editorial measures to address the problem by improving the consistency and quality of reporting in life-sciences articles”

“We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data”

“Journals […] fail to exert sufficient scrutiny over the results that they publish”

your dataLook at

A picture is worth a thousand words

Location of deaths in the 1854 London Cholera Epidemic

John Snow (1813-1858)

Dataset #1 Dataset #2 Dataset #3 Dataset #4

x y x y x y x y

10 8.04 10 9.14 10 7.46 8 6.58

8 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.71

9 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.47

14 9.96 14 8.1 14 8.84 8 7.04

6 7.24 6 6.13 6 6.08 8 5.25

4 4.26 4 3.1 4 5.39 19 12.5

12 10.84 12 9.13 12 8.15 8 5.56

7 4.82 7 7.26 7 6.42 8 7.91

5 5.68 5 4.74 5 5.73 8 6.89

Why visualize your data?

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

The Anscombe’s quartet example

Why visualize your data?

Property in each case Value

Mean of x 9 (exact)

Variance of x 11 (exact)

Mean of y 7.5

Variance of y 4.122 or 4.127

Correlation of x and y 0.816

Linear regression line y = 3.00 + 0.500x

The Anscombe’s quartet example

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

Why visualize your data?

Dataset #1 Dataset #2

Dataset #4Dataset #3

The Anscombe’s quartet example

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

Why visualize your data?

Dataset #1 Dataset #2

Dataset #4Dataset #3

The Anscombe’s quartet example

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

Visualize your data in their raw form!

Aim for revelation rather than mere summary

A great graphic with raw data will reveal unexpected patterns and invites us to make comparisons we might not have thought of beforehand.

If you are still not convinced …

Mean: 16 / Stdv: 5

If you are still not convinced …

Mean: 16 / Stdv: 5

If you are still not convinced …

Mean: 16 / Stdv: 5

80

60

40

20

0D

onor

eng

raftm

ent (

%)

P < 0.05

mH19

WBM secondary transplantation(16 weeks)

e

flDMR/+ 6DMR/+Daniel’s Journal Club paper

Avoid making bar graphs

Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133

“To maintain the highest level of trustworthiness of data, we are encouraging authors to display data in their raw form and not in a fashion that conceals their variance.

Presenting data as columns with error bars (dynamite plunger plots) conceals data. We recommend that individual data be presented as dot plots shown next to the average for the group with appropriate error bars (Figure 1).”

Avoid making bar graphs

Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11

0

25

50

75

100SORRY,

WE JUST+)6¼<�<:=;<

YOU...

Different types, different meanings

• descriptive statistics (Range, SD)

• inferential statistics (SE, CI)

Error bars

Avoid making bar graphs

Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11

Different types, different meanings

• descriptive statistics (Range, SD)

• inferential statistics (SE, CI)

Often, they also imply a symmetrical distribution of the data.

Error bars

í�ı í�ı �ıí�ı

95%

�ıµ �ı

Avoid making bar graphs

95% of a normal distribution lies within two standard deviations (σ) of the mean (µ)

Mean and Standard deviation are only useful in the context of a “normal distribution”

Avoid making bar graphs

skewed distribution

symmetrical distribution

Data presentation to reveal the distribution of the data • Display data in their raw form. • A dot plot is a good start. • “Dynamite plunger plots” conceal data. • Check the pattern of distribution of the values.

Avoid making bar graphs

• First set: Gaussian (or normal) distribution (symmetrically distributed)

skewed distribution

symmetrical distribution

• Second set: right skewed, lognormal (few large values) “ This type of distribution of values is quite common in biology (ex: plasma concentrations of immune or inflammatory mediators)” “Plunger plots only: who would know that the values were skewed – ... ... and that the common statistical tests would be inappropriate?”

Avoid making bar graphs

Bar graph Dynamite plunger

Don't tell me no one warned you before!

Summary

Looking for patterns and relationships

Providing a narrative for the reader

Summarize complex data structures

Help avoid erroneous conclusions based upon questionable or unexpected data

For others ...

But primarily for you ...

Why visualize your data?

your dataChose the right descriptor for

Averages can be misleading

Averages can be misleading

Averages can be misleading

Averages can be misleading

Is the mean always a good descriptor?

http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87

# of children per household in China (2012)

• mean: 1.35

Is the mean always a good descriptor?

http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87

# of children per household in China (2012)

• mean: 1.35 • median: 1

more representative of the “typical” family (One child policy)

Any measure is wrong!

http://www.youtube.com/watch?v=JUxHebuXviM

Walter Lewis (MIT)

“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”

183.3cm 185.7cm

Any measure is wrong!

Walter Lewis (MIT)

“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”

The same concept applies when you report your data!

Provide the uncertainty of your descriptor hint: this is NOT the standard deviation

Any measure is wrong!

Walter Lewis (MIT)

“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”

The same concept applies when you report your data!

Provide the uncertainty of your descriptor hint: this is NOT the standard deviation

Report the Confidence Interval of your descriptor

The Bootstrap: origin

Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5

Modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow [...] to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability.

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

... ...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

5.18 [4.91, 4.47]

your dataAnalyze

Choose your statistical test wisely

http://www.nature.com/nature/authors/gta/#a5.6

Every paper that contains statistical testing should state [...] a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data), [...], whether the tests were one-tailed or two-tailed, and the actual P value for each test (not merely "significant" or "P < 0.5").

Authors Guidelines

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

MaleFemale

Distribution of the data?

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

MaleFemale

Distribution of the data?

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

• fit of the histogram

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot

Theoretical quantiles of the distribution \

Φ−1! i − 3/8

n + 1/4

"

A(i)ith point

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot

not “normal”

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

MaleFemale

Distribution of the data?

• fit of the histogram • QQ plot

Male

Female

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot

visual inspection

MaleFemale

Male

Female

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot • Shapiro-Wilk test

visual inspection

test

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot • Shapiro-Wilk test

visual inspection

test

Null Hypothesis for the SW test: Data are normally distributed

Female p-value: 0.9195

Male p-value: 0.3866

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

Statistical test?

t-test

MaleFemale

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

Statistical test?

t-test

t-test p-value < 2.2e-16

Null Hypothesis for the t-test: Data belong to the same population

MaleFemale

Usually it is not so simple

The “not so simple” case

S1 S2

The “not so simple” case

S1 S2

The “not so simple” case

S1 S2

S1 S2

The “not so simple” case

S1 S2

S1 S2 Shapiro-Wilk test:

S2 p-value: 6.7e-06

S1 p-value: 7.4e-05

What to do?

What to do?

For the t-test: !

• Mann-Whitney U (independant) !

• Wilcoxon (dependant)

Non parametric alternatives

Choose a new statistical heroBootstrapman

t-test

Computing the bootstrap p-value

Are the two samples different?

Observed difference = 0.44

Computing the bootstrap p-value

Are the two samples different?

If the two samples were from the same population, what would the probabilities be that the observed difference was from chance alone?

Observed difference = 0.44

Computing the bootstrap p-valuea1 a2a3

a4a5

an b5

b1b2 b3b4 bn

A0 B0D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

A0 B0D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

D1 = mA1-mB1

D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

D1 = mA1-mB1

D0 = mA-mB (0.44)

D0 = 0.44

D1 = -0.83

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

A0 B0D0 = mA-mB (0.44)

D0 = 0.44

a2

a1

b1

b5

b3a4

b5b5

an

b5

b1

a1

mA2 mB2

A2 B2

D2 = mA2-mB2

D1 = -0.83 D2 = 0.84

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

(0.44)

How many pseudo-differences are greater or equal than the observed difference D0 ?

D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

(0.44)

9829<D0 171>D0

How many pseudo-differences are greater or equal than the observed difference D0 ?

D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

9829<D0 171>D0

(0.44)

p = = 0.0171171 10000

(one-tailed)

How many pseudo-differences are greater or equal than the observed difference D0 ?

D0 = mA-mB (0.44)

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

How many pseudo-differences are greater or equal than the observed difference D0 ?

(0.44)

MW: p = 0.0169

9829<D0 171>D0

p = = 0.0171171 10000

(one-tailed)

D0 = mA-mB (0.44)

Summary

• visual inspection (hist. / QQ plot) • normality test

How do my data look like?

Distribution?

What do I want to compare?

Right statistical test?• parametric test • non parametric test • resampling statistics

p-valueThe dark side of the

Statistical significance

“The effect of the drug was statistically significant.”

Statistical significance

“The effect of the drug was statistically significant.”

so what?

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Training has a larger effect in the mutant mice than in the control mice!

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Training has a larger effect in the mutant mice than in the control mice!

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Extreme scenario: - training-induced activity barely reaches significance in mutant mice (e.g., 0.049) and barely fails to reach significance for control mice (e.g., 0.051)

Act

ivity

control mutant+ +- -

*

Does not test whether training effect for mutant mice differs statistically from that for control mice.

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

When making a comparison between two effects, always report the statistical significance of their difference rather than the difference between significance levels.

Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”, Nat Neuroscience, 14(9):1105-1107

P-values do not convey information

Difference = 4

Mean: 16 SD: 5

Mean: 20 SD: 5

p-value = 0.1090

P-values do not convey information

0.10900.0367

Difference = 4

p-value =

Mean: 16 SD: 5

Mean: 20 SD: 5

P-values do not convey information

0.10900.03670.0009

Difference = 4

p-value =

Mean: 16 SD: 5

Mean: 20 SD: 5

P-values do not convey informationMost applied scientists use p-values as a measure of evidence and of the size of the effect

Fact:

0

2

4

6

8

-log

10(P)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

“Manhattan plot”

- This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies

- The probability of hypotheses depends on much more than just the p-value.

Loannidis JP, (2005) PLoS Med 2(8):e124

Report effect size and CIs instead

P-value is function of the sample size

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94

Measured Effect Size: difference = 0.018 mV

0

0.2

0.4

)V

m( edutilpm

Acontrol

(n=6777)atropine(n=5272)

Control

Atropine

0.5 mV100 ms

P-value is function of the sample size

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94

p = 10-5

Measured Effect Size: difference = 0.018 mV

0

0.2

0.4

)V

m( edutilpm

Acontrol

(n=6777)atropine(n=5272)

Control

Atropine

0.5 mV100 ms

P-value is function of the sample size

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94

not significant

significant

101 102 103

10–4

10–2

100

P)t

set-t(

101

102 103

–0.4

–0.2

0

0.2

0.4

g 's

eg

de

H

Sample size

0.018 mV

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

(0.44)

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

(0.44)

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

(0.44)

250th 9750th

Bootstrap effect size and 95% CIs

BA 250th 9750th

Eff. size = 0.44

0.44 [0.042, 0.853]

Do the 95% confidence intervals of the observed effect size include zero (no difference)?

significanceStatistical vs Biological

Statistical vs Biological significance

“Statistical significance suggests but does not imply biological significance.”

“The P value reported by tests is a probabilistic significance, not a biological one.”

Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”. Nature Methods 10, 1041–1042

Statistical vs Biological significance

Statistical significance has a meaning in a specific context

No change

Biological consequences?Small change

Large change

Statistical vs Biological significance

LP 1 LP 2

Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons in different animals". Nat Neurosci. 9: 356– 362

0

Cond

ucta

nces

at +

15 m

V (µ

S/nF

)

Kd KCa A-type

0.10

0.20

0.30

0.60

0.50

0.40

0

mRN

A co

py n

umbe

rshab BK-KC

200

400

600

800

1,000

1,200

1,400

1,600

shal

AB

LP

PD

PY“Good enough” solutionsSomato-gastric ganglion

Statistical vs Biological significance

Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in cardiac myocytes". J Physiol 589(Pt 24):6081-92

Statistical vs Biological significanceBreast cancer study Difference in cancer returning between control vs low-fat diet groups.

Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning

Statistical vs Biological significance

Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning

Actual return rates: - control: 12.4% - low-fat diet: 9.8%

Difference 2.6%

2.6 9.8 = 26.5%

Breast cancer study Difference in cancer returning between control vs low-fat diet groups.

Beware of false positives

Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5

(from the authors)

Beware of false positives

Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5

Beware of false positives

2012Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5

Beware of false positives

http://xkcd.com/882/

your dataPresent

Know your audience

Know your audience

Who?

What?

Why?

How?

Know your audience

Who?

What?

Why?

How?

who is my audience? level of understanding? what do they already know?

Know your audience

Who?

What?

Why?

How?

who is my audience? level of understanding? what do they already know?

why am I presenting? what do my audience want to achieve?

Know your audience

Who?

What?

Why?

How?

why am I presenting? what do my audience want to achieve?

what do I want my audience to know? which story will captivate the audience?

who is my audience? level of understanding? what do they already know?

Know your audience

Who?

What?

Why?

How? what medium will support the message the best? what format/layout will appeal to the audience?

who is my audience? level of understanding? what do they already know?

why am I presenting? what do my audience want to achieve?

what do I want my audience to know? which story will captivate the audience?

Color blindness is a common diseaseMales: one in 12 (8%) / Females: one in 200 (0.5%)

Color blindness is a common disease

“Anyone who needs to be convinced that making scientific images more accessible is a worthwhile task [...]: if your next grant or manuscript submission contains color figures, what if some of your reviewers are color blind? Will they be able to appreciate your figures? Considering the competition for funding and for publication, can you afford the possibility of frustrating your audience? The solution is at hand."

Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog (http://blogs.nature.com/nautilus/2007/02/post_4.html)

Making figures for color blind people

Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441

Making figures for color blind people

http://colororacle.org/

Making figures for color blind people

http://colororacle.org/

Telling stories with data

http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf

“The Martini Glass Structure”

Telling stories with data

http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf

“The Martini Glass Structure”

EXPLORESTARTGUIDED

!

NARRATIVE

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Common mistakes in data reporting

Welcome to the FOX “Dishonest Charts” gallery

Common mistakes in data reporting

Common mistakes in data reportingE. Tufte’s “Lie Factor”Make things appear to be “better” than they are by fiddling with the scales of things

Common mistakes in data reporting

Common mistakes in data reporting

Common mistakes in data reporting

Common mistakes in data reporting

Common mistakes in data reporting

Common mistakes in data reporting

“We found that relative to WT mice, the luminal microbiota of Il10−/− mice exhibited a ~100-fold increase in E. coli (Fig. 1I)”

Arthur et al, (2012) Science 5;338(6103):120-3

Fig 1I

Common mistakes in data reporting

A

E

BCD

Common mistakes in data reporting

A

E

BCD

20%20%

20%

20%

20%

Common mistakes in data reporting

Common mistakes in data reporting

Common mistakes in data reporting

0

10

20

30

40

year1 year2 year3 year4

Percent Return on Investment

Group A Group B

year4year3year2year1

010203040

Group AGroup B

Percent Return on Investment

Thank you!

“The important thing is not to stop questioning. Curiosity has its own reason for existing”

- Albert Einstein-