Explore, Analyze and Present your data

153
your data Guillaume Calmettes present analyze explore

description

 

Transcript of Explore, Analyze and Present your data

Page 1: Explore, Analyze and Present your data

your dataGuillaume Calmettes

presentanalyze

explore

Page 2: Explore, Analyze and Present your data

“Bonjour”, I am Guillaume!

Sacre Bleu!

[email protected]: MRL 3645

Bordeaux

Page 3: Explore, Analyze and Present your data

Disclaimer

I am not a statistician

Page 4: Explore, Analyze and Present your data

Statistics are scary

Statistics

(You at the beginning of the talk)

Page 5: Explore, Analyze and Present your data

Statistics are scary

Statistics

not so

(You at the middle of the talk)

Page 6: Explore, Analyze and Present your data

Statistics are scary

Statistics

cool

(You at the end of the talk)

Page 7: Explore, Analyze and Present your data

Statistics are scary

Statistics

coolWe have to deal with them anyways, so we had better enjoy them!

(You at the end of the talk)

Page 8: Explore, Analyze and Present your data

Press the t-test button and you’ll be done!

Did you check the normality of your data first?

Page 9: Explore, Analyze and Present your data

Why should you care about statistics?

http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf

Page 10: Explore, Analyze and Present your data

Why should you care about statistics?

Advances in Physiological Education

“Explorations in Statistics” series (2008-present) (Douglas Curran-Everett)

Page 11: Explore, Analyze and Present your data

Why should you care about statistics?

http://jp.physoc.org/cgi/collection/stats_reporting

The Journal of Physiology Experimental Physiology The British Journal of Pharmacology Microcirculation The British Journal of Nutrition

“Statistical Perspectives” series (2011-present) (Gordon Drummond)

Page 12: Explore, Analyze and Present your data

Why should you care about statistics?

http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html

Significance, P values and t-tests – November 2013 Introduction to the concept of statistical significance and the one-sample t-test.

Error Bars – October 2013 The use of error bars to represent uncertainty and advice on how to interpret them.

Importance of being uncertain – September 2013 How samples are used to estimate population statistics and what this means in terms of uncertainty.

Page 13: Explore, Analyze and Present your data

Why should you care about statistics?

“Nature research journals will introduce editorial measures to address the problem by improving the consistency and quality of reporting in life-sciences articles”

“We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data”

“Journals […] fail to exert sufficient scrutiny over the results that they publish”

Page 14: Explore, Analyze and Present your data

your dataLook at

Page 15: Explore, Analyze and Present your data

A picture is worth a thousand words

Location of deaths in the 1854 London Cholera Epidemic

John Snow (1813-1858)

Page 16: Explore, Analyze and Present your data

Dataset #1 Dataset #2 Dataset #3 Dataset #4

x y x y x y x y

10 8.04 10 9.14 10 7.46 8 6.58

8 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.71

9 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.47

14 9.96 14 8.1 14 8.84 8 7.04

6 7.24 6 6.13 6 6.08 8 5.25

4 4.26 4 3.1 4 5.39 19 12.5

12 10.84 12 9.13 12 8.15 8 5.56

7 4.82 7 7.26 7 6.42 8 7.91

5 5.68 5 4.74 5 5.73 8 6.89

Why visualize your data?

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

The Anscombe’s quartet example

Page 17: Explore, Analyze and Present your data

Why visualize your data?

Property in each case Value

Mean of x 9 (exact)

Variance of x 11 (exact)

Mean of y 7.5

Variance of y 4.122 or 4.127

Correlation of x and y 0.816

Linear regression line y = 3.00 + 0.500x

The Anscombe’s quartet example

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

Page 18: Explore, Analyze and Present your data

Why visualize your data?

Dataset #1 Dataset #2

Dataset #4Dataset #3

The Anscombe’s quartet example

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

Page 19: Explore, Analyze and Present your data

Why visualize your data?

Dataset #1 Dataset #2

Dataset #4Dataset #3

The Anscombe’s quartet example

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21

Page 20: Explore, Analyze and Present your data

Visualize your data in their raw form!

Aim for revelation rather than mere summary

A great graphic with raw data will reveal unexpected patterns and invites us to make comparisons we might not have thought of beforehand.

Page 21: Explore, Analyze and Present your data

If you are still not convinced …

Mean: 16 / Stdv: 5

Page 22: Explore, Analyze and Present your data

If you are still not convinced …

Mean: 16 / Stdv: 5

Page 23: Explore, Analyze and Present your data

If you are still not convinced …

Mean: 16 / Stdv: 5

80

60

40

20

0D

onor

eng

raftm

ent (

%)

P < 0.05

mH19

WBM secondary transplantation(16 weeks)

e

flDMR/+ 6DMR/+Daniel’s Journal Club paper

Page 24: Explore, Analyze and Present your data

Avoid making bar graphs

Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133

“To maintain the highest level of trustworthiness of data, we are encouraging authors to display data in their raw form and not in a fashion that conceals their variance.

Presenting data as columns with error bars (dynamite plunger plots) conceals data. We recommend that individual data be presented as dot plots shown next to the average for the group with appropriate error bars (Figure 1).”

Page 25: Explore, Analyze and Present your data

Avoid making bar graphs

Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11

0

25

50

75

100SORRY,

WE JUST+)6¼<�<:=;<

YOU...

Different types, different meanings

• descriptive statistics (Range, SD)

• inferential statistics (SE, CI)

Error bars

Page 26: Explore, Analyze and Present your data

Avoid making bar graphs

Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11

Different types, different meanings

• descriptive statistics (Range, SD)

• inferential statistics (SE, CI)

Often, they also imply a symmetrical distribution of the data.

Error bars

Page 27: Explore, Analyze and Present your data

í�ı í�ı �ıí�ı

95%

�ıµ �ı

Avoid making bar graphs

95% of a normal distribution lies within two standard deviations (σ) of the mean (µ)

Mean and Standard deviation are only useful in the context of a “normal distribution”

Page 28: Explore, Analyze and Present your data

Avoid making bar graphs

skewed distribution

symmetrical distribution

Data presentation to reveal the distribution of the data • Display data in their raw form. • A dot plot is a good start. • “Dynamite plunger plots” conceal data. • Check the pattern of distribution of the values.

Page 29: Explore, Analyze and Present your data

Avoid making bar graphs

• First set: Gaussian (or normal) distribution (symmetrically distributed)

skewed distribution

symmetrical distribution

• Second set: right skewed, lognormal (few large values) “ This type of distribution of values is quite common in biology (ex: plasma concentrations of immune or inflammatory mediators)” “Plunger plots only: who would know that the values were skewed – ... ... and that the common statistical tests would be inappropriate?”

Page 30: Explore, Analyze and Present your data

Avoid making bar graphs

Bar graph Dynamite plunger

Don't tell me no one warned you before!

Page 31: Explore, Analyze and Present your data

Summary

Looking for patterns and relationships

Providing a narrative for the reader

Summarize complex data structures

Help avoid erroneous conclusions based upon questionable or unexpected data

For others ...

But primarily for you ...

Why visualize your data?

Page 32: Explore, Analyze and Present your data

your dataChose the right descriptor for

Page 33: Explore, Analyze and Present your data

Averages can be misleading

Page 34: Explore, Analyze and Present your data

Averages can be misleading

Page 35: Explore, Analyze and Present your data

Averages can be misleading

Page 36: Explore, Analyze and Present your data

Averages can be misleading

Page 37: Explore, Analyze and Present your data

Is the mean always a good descriptor?

http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87

# of children per household in China (2012)

• mean: 1.35

Page 38: Explore, Analyze and Present your data

Is the mean always a good descriptor?

http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87

# of children per household in China (2012)

• mean: 1.35 • median: 1

more representative of the “typical” family (One child policy)

Page 39: Explore, Analyze and Present your data

Any measure is wrong!

http://www.youtube.com/watch?v=JUxHebuXviM

Walter Lewis (MIT)

“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”

183.3cm 185.7cm

Page 40: Explore, Analyze and Present your data

Any measure is wrong!

Walter Lewis (MIT)

“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”

The same concept applies when you report your data!

Provide the uncertainty of your descriptor hint: this is NOT the standard deviation

Page 41: Explore, Analyze and Present your data

Any measure is wrong!

Walter Lewis (MIT)

“Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless”

The same concept applies when you report your data!

Provide the uncertainty of your descriptor hint: this is NOT the standard deviation

Report the Confidence Interval of your descriptor

Page 42: Explore, Analyze and Present your data

The Bootstrap: origin

Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5

Modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow [...] to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability.

Page 43: Explore, Analyze and Present your data

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Page 44: Explore, Analyze and Present your data

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Page 45: Explore, Analyze and Present your data

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

... ...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Page 46: Explore, Analyze and Present your data

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

Page 47: Explore, Analyze and Present your data

Computing the bootstrap 95% CIa1 a2a3

a4a5

an

a2

a4

a1

a2

a3a1

a3a5

an

a5

a1

a2

mA1 mA2

A1 A2

a3a4

an

an

a1

a1

mA3

A2

a1a3

an

a4

a5

a3

mA4

A2

...

A0 (m0)

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406

5.18 [4.91, 4.47]

Page 48: Explore, Analyze and Present your data

your dataAnalyze

Page 49: Explore, Analyze and Present your data

Choose your statistical test wisely

http://www.nature.com/nature/authors/gta/#a5.6

Every paper that contains statistical testing should state [...] a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data), [...], whether the tests were one-tailed or two-tailed, and the actual P value for each test (not merely "significant" or "P < 0.5").

Authors Guidelines

Page 50: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

MaleFemale

Page 51: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

MaleFemale

Distribution of the data?

Page 52: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

MaleFemale

Distribution of the data?

Page 53: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

• fit of the histogram

MaleFemale

Page 54: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram

MaleFemale

Page 55: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot

Theoretical quantiles of the distribution \

Φ−1! i − 3/8

n + 1/4

"

A(i)ith point

MaleFemale

Page 56: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot

not “normal”

MaleFemale

Page 57: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

MaleFemale

Distribution of the data?

• fit of the histogram • QQ plot

Male

Female

Page 58: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot

visual inspection

MaleFemale

Male

Female

Page 59: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot • Shapiro-Wilk test

visual inspection

test

MaleFemale

Page 60: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

Distribution of the data?difference/ci

51.2 [50.4, 51.9] • fit of the histogram • QQ plot • Shapiro-Wilk test

visual inspection

test

Null Hypothesis for the SW test: Data are normally distributed

Female p-value: 0.9195

Male p-value: 0.3866

MaleFemale

Page 61: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

MaleFemale

Page 62: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

MaleFemale

Page 63: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

MaleFemale

Page 64: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

Statistical test?

t-test

MaleFemale

Page 65: Explore, Analyze and Present your data

The simple case (How to)

mean/std 135.9 ± 19.0

mean/std 187.0 ± 19.8

difference/ci 51.2 [50.4, 51.9]

Distribution of the data?

Normally distributed

Statistical test?

t-test

t-test p-value < 2.2e-16

Null Hypothesis for the t-test: Data belong to the same population

MaleFemale

Page 66: Explore, Analyze and Present your data

Usually it is not so simple

Page 67: Explore, Analyze and Present your data

The “not so simple” case

S1 S2

Page 68: Explore, Analyze and Present your data

The “not so simple” case

S1 S2

Page 69: Explore, Analyze and Present your data

The “not so simple” case

S1 S2

S1 S2

Page 70: Explore, Analyze and Present your data

The “not so simple” case

S1 S2

S1 S2 Shapiro-Wilk test:

S2 p-value: 6.7e-06

S1 p-value: 7.4e-05

Page 71: Explore, Analyze and Present your data

What to do?

Page 72: Explore, Analyze and Present your data

What to do?

For the t-test: !

• Mann-Whitney U (independant) !

• Wilcoxon (dependant)

Non parametric alternatives

Page 73: Explore, Analyze and Present your data

Choose a new statistical heroBootstrapman

t-test

Page 74: Explore, Analyze and Present your data

Computing the bootstrap p-value

Are the two samples different?

Observed difference = 0.44

Page 75: Explore, Analyze and Present your data

Computing the bootstrap p-value

Are the two samples different?

If the two samples were from the same population, what would the probabilities be that the observed difference was from chance alone?

Observed difference = 0.44

Page 76: Explore, Analyze and Present your data

Computing the bootstrap p-valuea1 a2a3

a4a5

an b5

b1b2 b3b4 bn

A0 B0D0 = mA-mB (0.44)

Page 77: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

A0 B0D0 = mA-mB (0.44)

Page 78: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

D1 = mA1-mB1

D0 = mA-mB (0.44)

Page 79: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

D1 = mA1-mB1

D0 = mA-mB (0.44)

D0 = 0.44

D1 = -0.83

Page 80: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

A0 B0D0 = mA-mB (0.44)

D0 = 0.44

a2

a1

b1

b5

b3a4

b5b5

an

b5

b1

a1

mA2 mB2

A2 B2

D2 = mA2-mB2

D1 = -0.83 D2 = 0.84

Page 81: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

D0 = mA-mB (0.44)

Page 82: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

(0.44)

How many pseudo-differences are greater or equal than the observed difference D0 ?

D0 = mA-mB (0.44)

Page 83: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

(0.44)

9829<D0 171>D0

How many pseudo-differences are greater or equal than the observed difference D0 ?

D0 = mA-mB (0.44)

Page 84: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

9829<D0 171>D0

(0.44)

p = = 0.0171171 10000

(one-tailed)

How many pseudo-differences are greater or equal than the observed difference D0 ?

D0 = mA-mB (0.44)

Page 85: Explore, Analyze and Present your data

Computing the bootstrap p-value

a2 a3

a4

a5an

b5

b1b2b3 b4

bn

a1

a2

a4

b1

b2

b3a1

a3a5

an

b5

b1

b2

a1 a2a3

a4a5

an b5

b1b2 b3b4 bn

mA1 mB1

A1 B1

A0 B0

Repeat 10000 times (D1 ... D10000)

D1 = mA1-mB1

How many pseudo-differences are greater or equal than the observed difference D0 ?

(0.44)

MW: p = 0.0169

9829<D0 171>D0

p = = 0.0171171 10000

(one-tailed)

D0 = mA-mB (0.44)

Page 86: Explore, Analyze and Present your data

Summary

• visual inspection (hist. / QQ plot) • normality test

How do my data look like?

Distribution?

What do I want to compare?

Right statistical test?• parametric test • non parametric test • resampling statistics

Page 87: Explore, Analyze and Present your data

p-valueThe dark side of the

Page 88: Explore, Analyze and Present your data

Statistical significance

“The effect of the drug was statistically significant.”

Page 89: Explore, Analyze and Present your data

Statistical significance

“The effect of the drug was statistically significant.”

so what?

Page 90: Explore, Analyze and Present your data

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Page 91: Explore, Analyze and Present your data

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Training has a larger effect in the mutant mice than in the control mice!

Page 92: Explore, Analyze and Present your data

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Training has a larger effect in the mutant mice than in the control mice!

Page 93: Explore, Analyze and Present your data

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

Extreme scenario: - training-induced activity barely reaches significance in mutant mice (e.g., 0.049) and barely fails to reach significance for control mice (e.g., 0.051)

Act

ivity

control mutant+ +- -

*

Does not test whether training effect for mutant mice differs statistically from that for control mice.

Page 94: Explore, Analyze and Present your data

Statistical significance (example)“The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”

When making a comparison between two effects, always report the statistical significance of their difference rather than the difference between significance levels.

Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”, Nat Neuroscience, 14(9):1105-1107

Page 95: Explore, Analyze and Present your data

P-values do not convey information

Difference = 4

Mean: 16 SD: 5

Mean: 20 SD: 5

p-value = 0.1090

Page 96: Explore, Analyze and Present your data

P-values do not convey information

0.10900.0367

Difference = 4

p-value =

Mean: 16 SD: 5

Mean: 20 SD: 5

Page 97: Explore, Analyze and Present your data

P-values do not convey information

0.10900.03670.0009

Difference = 4

p-value =

Mean: 16 SD: 5

Mean: 20 SD: 5

Page 98: Explore, Analyze and Present your data

P-values do not convey informationMost applied scientists use p-values as a measure of evidence and of the size of the effect

Fact:

0

2

4

6

8

-log

10(P)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

“Manhattan plot”

- This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies

- The probability of hypotheses depends on much more than just the p-value.

Loannidis JP, (2005) PLoS Med 2(8):e124

Page 99: Explore, Analyze and Present your data

Report effect size and CIs instead

Page 100: Explore, Analyze and Present your data

P-value is function of the sample size

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94

Measured Effect Size: difference = 0.018 mV

0

0.2

0.4

)V

m( edutilpm

Acontrol

(n=6777)atropine(n=5272)

Control

Atropine

0.5 mV100 ms

Page 101: Explore, Analyze and Present your data

P-value is function of the sample size

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94

p = 10-5

Measured Effect Size: difference = 0.018 mV

0

0.2

0.4

)V

m( edutilpm

Acontrol

(n=6777)atropine(n=5272)

Control

Atropine

0.5 mV100 ms

Page 102: Explore, Analyze and Present your data

P-value is function of the sample size

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94

not significant

significant

101 102 103

10–4

10–2

100

P)t

set-t(

101

102 103

–0.4

–0.2

0

0.2

0.4

g 's

eg

de

H

Sample size

0.018 mV

Page 103: Explore, Analyze and Present your data

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

Page 104: Explore, Analyze and Present your data

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

(0.44)

Page 105: Explore, Analyze and Present your data

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

(0.44)

Page 106: Explore, Analyze and Present your data

Bootstrap effect size and 95% CIsa1 a2

a3

a4a5 an

etc...

a5a1a5a3

a3a7a1a4

a2a2a9a1

a6a3a4a3

a1a1a8a6

etc...

A

mA1 mA2 mA3 mA4 mA5

(10000 times)

E1 (mA1-mB1 )

E2 (mA1-mB1 )

E10000 (mA10000-mB10000 )

b1 b2

b3

b4b5 bn

etc...

b4b2b2b1

b7b5b3b4

b2b1b1b1

b3b8b4b5

b1b1b2b4

etc...

B

mB1 mB2 mB3 mB4 mB5

(10000 times)

...

(0.44)

250th 9750th

Page 107: Explore, Analyze and Present your data

Bootstrap effect size and 95% CIs

BA 250th 9750th

Eff. size = 0.44

0.44 [0.042, 0.853]

Do the 95% confidence intervals of the observed effect size include zero (no difference)?

Page 108: Explore, Analyze and Present your data

significanceStatistical vs Biological

Page 109: Explore, Analyze and Present your data

Statistical vs Biological significance

“Statistical significance suggests but does not imply biological significance.”

“The P value reported by tests is a probabilistic significance, not a biological one.”

Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”. Nature Methods 10, 1041–1042

Page 110: Explore, Analyze and Present your data

Statistical vs Biological significance

Statistical significance has a meaning in a specific context

No change

Biological consequences?Small change

Large change

Page 111: Explore, Analyze and Present your data

Statistical vs Biological significance

LP 1 LP 2

Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons in different animals". Nat Neurosci. 9: 356– 362

0

Cond

ucta

nces

at +

15 m

V (µ

S/nF

)

Kd KCa A-type

0.10

0.20

0.30

0.60

0.50

0.40

0

mRN

A co

py n

umbe

rshab BK-KC

200

400

600

800

1,000

1,200

1,400

1,600

shal

AB

LP

PD

PY“Good enough” solutionsSomato-gastric ganglion

Page 112: Explore, Analyze and Present your data

Statistical vs Biological significance

Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in cardiac myocytes". J Physiol 589(Pt 24):6081-92

Page 113: Explore, Analyze and Present your data

Statistical vs Biological significanceBreast cancer study Difference in cancer returning between control vs low-fat diet groups.

Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning

Page 114: Explore, Analyze and Present your data

Statistical vs Biological significance

Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning

Actual return rates: - control: 12.4% - low-fat diet: 9.8%

Difference 2.6%

2.6 9.8 = 26.5%

Breast cancer study Difference in cancer returning between control vs low-fat diet groups.

Page 115: Explore, Analyze and Present your data

Beware of false positives

Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5

(from the authors)

Page 116: Explore, Analyze and Present your data

Beware of false positives

Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5

Page 117: Explore, Analyze and Present your data

Beware of false positives

2012Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5

Page 118: Explore, Analyze and Present your data

Beware of false positives

http://xkcd.com/882/

Page 119: Explore, Analyze and Present your data

your dataPresent

Page 120: Explore, Analyze and Present your data

Know your audience

Page 121: Explore, Analyze and Present your data

Know your audience

Who?

What?

Why?

How?

Page 122: Explore, Analyze and Present your data

Know your audience

Who?

What?

Why?

How?

who is my audience? level of understanding? what do they already know?

Page 123: Explore, Analyze and Present your data

Know your audience

Who?

What?

Why?

How?

who is my audience? level of understanding? what do they already know?

why am I presenting? what do my audience want to achieve?

Page 124: Explore, Analyze and Present your data

Know your audience

Who?

What?

Why?

How?

why am I presenting? what do my audience want to achieve?

what do I want my audience to know? which story will captivate the audience?

who is my audience? level of understanding? what do they already know?

Page 125: Explore, Analyze and Present your data

Know your audience

Who?

What?

Why?

How? what medium will support the message the best? what format/layout will appeal to the audience?

who is my audience? level of understanding? what do they already know?

why am I presenting? what do my audience want to achieve?

what do I want my audience to know? which story will captivate the audience?

Page 126: Explore, Analyze and Present your data

Color blindness is a common diseaseMales: one in 12 (8%) / Females: one in 200 (0.5%)

Page 127: Explore, Analyze and Present your data

Color blindness is a common disease

“Anyone who needs to be convinced that making scientific images more accessible is a worthwhile task [...]: if your next grant or manuscript submission contains color figures, what if some of your reviewers are color blind? Will they be able to appreciate your figures? Considering the competition for funding and for publication, can you afford the possibility of frustrating your audience? The solution is at hand."

Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog (http://blogs.nature.com/nautilus/2007/02/post_4.html)

Page 128: Explore, Analyze and Present your data

Making figures for color blind people

Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441

Page 129: Explore, Analyze and Present your data

Making figures for color blind people

http://colororacle.org/

Page 130: Explore, Analyze and Present your data

Making figures for color blind people

http://colororacle.org/

Page 131: Explore, Analyze and Present your data

Telling stories with data

http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf

“The Martini Glass Structure”

Page 132: Explore, Analyze and Present your data

Telling stories with data

http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf

“The Martini Glass Structure”

EXPLORESTARTGUIDED

!

NARRATIVE

Page 133: Explore, Analyze and Present your data

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Page 134: Explore, Analyze and Present your data

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Page 135: Explore, Analyze and Present your data

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Page 136: Explore, Analyze and Present your data

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Page 137: Explore, Analyze and Present your data

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Page 138: Explore, Analyze and Present your data

Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"

Page 139: Explore, Analyze and Present your data

Common mistakes in data reporting

Welcome to the FOX “Dishonest Charts” gallery

Page 140: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 141: Explore, Analyze and Present your data

Common mistakes in data reportingE. Tufte’s “Lie Factor”Make things appear to be “better” than they are by fiddling with the scales of things

Page 142: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 143: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 144: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 145: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 146: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 147: Explore, Analyze and Present your data

Common mistakes in data reporting

“We found that relative to WT mice, the luminal microbiota of Il10−/− mice exhibited a ~100-fold increase in E. coli (Fig. 1I)”

Arthur et al, (2012) Science 5;338(6103):120-3

Fig 1I

Page 148: Explore, Analyze and Present your data

Common mistakes in data reporting

A

E

BCD

Page 149: Explore, Analyze and Present your data

Common mistakes in data reporting

A

E

BCD

20%20%

20%

20%

20%

Page 150: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 151: Explore, Analyze and Present your data

Common mistakes in data reporting

Page 152: Explore, Analyze and Present your data

Common mistakes in data reporting

0

10

20

30

40

year1 year2 year3 year4

Percent Return on Investment

Group A Group B

year4year3year2year1

010203040

Group AGroup B

Percent Return on Investment

Page 153: Explore, Analyze and Present your data

Thank you!

“The important thing is not to stop questioning. Curiosity has its own reason for existing”

- Albert Einstein-