Basic data analyses skills for science research

37
The Dos and Don’ts!! Prepared by Law HL

description

 

Transcript of Basic data analyses skills for science research

Page 1: Basic data analyses skills for science research

The Dos and Don’ts!!

Prepared by Law HL

Page 2: Basic data analyses skills for science research

Statistics

the practice or science of collecting and

analysing numerical data in large quantities,

especially for the purpose of inferring proportions

in a whole from those in a representative sample.

used to communicate research findings and to

support hypotheses and give credibility to research

methodology and conclusions.

Page 3: Basic data analyses skills for science research

Two Branches of Statistics

Page 4: Basic data analyses skills for science research

Example 1: Is the lipase concentration

significantly different among the various fruits? Fruit samples 1st Sample 2nd Sample 3rd Sample Average

lipase

concentration

Lime 0.564 0.585 0.606 0.585

Lemon 0.104 0.101 0.107 0.104

Grapefruit 0.182 0.183 0.181 0.182

Avocado 0.415 0.637 0.550 0.534

Peanut 0.182 0.328 0.405 0.367

Page 5: Basic data analyses skills for science research

0.585

0.104

0.182

0.534

0.367

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Lime Lemon Grapefruit Avocado Peanut

Ave

rag

e l

ipa

se

co

nc

en

tra

tio

n U

/10

0u

L

Fruits

Average lipase concentration in various fruits

No observable

difference between the

average lipase

concentration of lime

and avocado

No significant

difference between the

average lipase

concentration of lime

and avocado!!

Page 6: Basic data analyses skills for science research

Student’s Conclusion:

Lime has a significantly higher ?? lipase

concentration than the other fruit

samples.

Page 7: Basic data analyses skills for science research

Error Bars

Overlap – no observable difference

Overlap – no significant difference if

inferential stats is used

No overlap – observable difference

No overlap – significant difference is

inferential stats is used

Page 8: Basic data analyses skills for science research

Example 2: Is the average distance

travelled by the shuttlecock

significantly different among the

various shots? Trials Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Average

Shot 1 4.921 4.698 4.598 4.822 5.171 5.096 4.884

Shot 2 4.879 4.772 4.772 4.787 4.808 4.596 4.769

Shot 3 4.483 4.536 4.565 4.430 4.760 4.594 4.561

Shot 4 4.392 4.268 4.096 4.162 4.388 4.462 4.295

Shot 5 4.180 4.122 4.142 4.092 4.238 3.712 4.081

Shot 6 3.612 3.698 3.612 3.962 3.788 3.928 3.767

Page 9: Basic data analyses skills for science research

4.884 4.769 4.561

4.295 4.081

3.767

0

1

2

3

4

5

6

Shot 1 Shot 2 Shot 3 Shot 4 Shot 5 Shot 6

Ave

rag

e d

ista

nc

e (

m)

SHOTS

Average distance travelled by the shuttlecock for each of the six shots

Page 10: Basic data analyses skills for science research

Student’s Conclusion:

There is a significant difference?? in the

average distance travelled by the

shuttlecock among the six shots.

Page 11: Basic data analyses skills for science research

Statistical Significance

The results observed that are due to

REAL treatment effects and NOT due to

Chance.

Page 12: Basic data analyses skills for science research
Page 13: Basic data analyses skills for science research

The P-Value approach

P-Value – the probability of obtaining a

value which is different from what is

being hypothesized.

The smaller the P-Value, the more likely

the results are statistically significant.

Page 14: Basic data analyses skills for science research

So…what is the P-Value for a

statistically significant result?

Generally……

P < 0.05 (Results are statistically

significant)

P < 0.001 (Results are extremely

statistically significant)

Page 15: Basic data analyses skills for science research

Example 3: Is there a significant

difference in the absorbance of

reaction mixture of papain at various

concentration?

P = 0.04

There is a significant difference in the average absorbance among the three concentrations of Papain.

Concentrations

of Papain (%)

Absorbance readings

2 0.40 0.52 0.51 0.49 0.42

5 0.35 0.42 0.44 0.53 0.31

10 0.41 0.36 0.21 0.21 0.33

Page 16: Basic data analyses skills for science research

Various Statistical tools for

generating P-Values.

Statistical Analyses

Group comparisons

Establishing linear relationships

between variables

Page 17: Basic data analyses skills for science research

Group comparisons

2 groups

Sample size n = 5 - 15

Mann-Whitney U-Test

Sample size n > 15

T-Test

More than 2 groups

Sample size n = 5 - 15

Kruskal-Wallis K-Test

Sample size n > 15

ANOVA Post hoc test:

Multiple Comparisons

Various Statistical tools for

generating P-Values (I)

Page 18: Basic data analyses skills for science research

Example 4: An experiment was

conducted to find out if the survival of E.Coli differed between those grown using brass and glass pots.

Since there are two groups to be compared and n = 9, use the Mann-Whitney Test

Results: P > 0.05

There is no significant difference in the average number of bacterial colonies between the two samples

Number of

bacterial

colonies in each

brass pot

Number of

bacterial

colonies

present in each

glass pot

405 412

310 231

196 89

63 567

167 134

312 253

675 423

465 134

78 231

Page 19: Basic data analyses skills for science research

Example 5: Experiment to find out if temperature

readings differ among the various layers.

Since n < 5 for each group, non of our

statistical tools is appropriate for the

analysis.

Page 20: Basic data analyses skills for science research

Example 6: Experiment to find out if the mean concentration

of ethanol produced differed significantly between

the two methods.

If n > 15 for both groups, use T-Test set at α = 5%

Page 21: Basic data analyses skills for science research

Example 7: Experiment to find out if the mean

amount of ion adsorbed by mango

peels differed significantly among the

three groups.

3 T-Tests??

T-Test

P = 0.00003

T-Test

P = 0.00005

T-Test

P = 0.00379

Page 22: Basic data analyses skills for science research

Example 7:

For comparing more than 2 means with

n > 15 for each treatment group, use

ANOVA.

DO NOT USE MULTIPLE T-TESTS as

the error rate gets INFLATED!!

If ANOVA shows a significant difference

in the means among the groups, use

Tukey’s Multiple Comparisons to

determine where the difference lies.

Page 23: Basic data analyses skills for science research

Example 8: Experiment to determine if there is a significant

difference in the average acid concentration

among the four preparations.

Comparing averages among three or more

groups with 5 ≤ n ≤ 15 for each group.

Kruskal Wallis Test

Preparation A Preparation B Preparation C Preparation D

0.45 0.35 0.24 0.34

0.35 0.56 0.12 0.56

0.46 0.24 0.13 0.53

0.24 0.56 0.17 0.43

0.56 0.24 0.45 0.21

Page 24: Basic data analyses skills for science research

Establishing linear relationships

between variables

Functional dependence of one variable on another

Simple linear regression

Non dependence between variables

Simple linear correlation

Various Statistical tools for

generating P-Values (II)

Page 25: Basic data analyses skills for science research

Simple Linear Regression

Two variables

One variable (dependent/response variable)

depends on the other (independent/predictor

variable)

Represented by

scatterplots

Reported with

r2 and P-value

Page 26: Basic data analyses skills for science research

r2 and P-value in regression

analysis r2 – coefficient of determination

Measures how much of the variation in

the dependent variable is due to the

independent variable.

0% ≤ r2 ≤ 100%

Page 27: Basic data analyses skills for science research

P-Value – the probability of obtaining the

slope of the regression line if the actual

slope is zero.

Always report

both r2 and

P-value.

r2 and P-value in regression

analysis

Sample

slope

Population

slope

n= 5

r2 = 0.80

Page 28: Basic data analyses skills for science research

Simple Linear Correlation

Two variables

Neither of the two is functionally

dependent on the other

Represented by scatterplots

r (pearson correlation coefficient) –

measures the strength of linear

relationship between two variables.

Always report both r and P-value

Page 29: Basic data analyses skills for science research

Guidelines to interpreting r Coefficient, r

Strength of Association Positive Negative

Small .1 to .3 -0.1 to -0.3

Medium .3 to .5 -0.3 to -0.5

Large .5 to 1.0 -0.5 to -1.0

Page 30: Basic data analyses skills for science research

Caution……………………..

It is not appropriate to analyze a non-

linear relationship using Pearson

correlation coefficient

Page 31: Basic data analyses skills for science research

Example 10: Experiment to find out if there is a significant

correlation between percentage of DPPH

reacted and concentration of fruit peel extract.

•P-Value?

•Scatterplot?

Page 32: Basic data analyses skills for science research

1. The Don’ts……………………

For n < 5, DO NOT analyze your data

with inferential statistics.

E.g. Trying to determine if the amount of

heavy metal ion removed differed

among the three methods

Method 1 Method 2 Method 3

0.421 0.324 0.534

0.521 0.512 0.342

0.654 0.526 0.523

Con

ce

ntra

tion

of

he

avy m

eta

l ion

rem

ove

d

Page 33: Basic data analyses skills for science research

2. The Don’ts………………

When no statistical analysis is being

performed on the data sets, refrain from

using the word ‘Significant’!

You can however claim that ‘there is an

observable difference…’

Page 34: Basic data analyses skills for science research

3. The Don’ts…………………

Data analyses DO NOT PROVE

hypotheses.

The results either support or do not

support the hypotheses.

Refrain from using the word ‘Prove’ or

Discover!!

Page 35: Basic data analyses skills for science research

3. The Don’ts…………………

Do not attempt to analyze too many variables at

the same time!

Analyses of multiple variables at the same time

Multivariate Statistical Analyses!!

Page 36: Basic data analyses skills for science research

The Dos…………

Decide on the appropriate significance level before statistical analyses (e.g. 5%)

Always factor in the appropriate statistical tool for analyzing your data at the planning stage

Always report your significance level and P-value!

Consult your treachers or Mr Law if you have any queries

Page 37: Basic data analyses skills for science research