Basic data analyses skills for science research

The Dos and Don’ts!!

Prepared by Law HL

Statistics

the practice or science of collecting and

analysing numerical data in large quantities,

especially for the purpose of inferring proportions

in a whole from those in a representative sample.

used to communicate research findings and to

support hypotheses and give credibility to research

methodology and conclusions.

Two Branches of Statistics

Example 1: Is the lipase concentration

significantly different among the various fruits? Fruit samples 1st Sample 2nd Sample 3rd Sample Average

lipase

concentration

Lime 0.564 0.585 0.606 0.585

Lemon 0.104 0.101 0.107 0.104

Grapefruit 0.182 0.183 0.181 0.182

Avocado 0.415 0.637 0.550 0.534

Peanut 0.182 0.328 0.405 0.367

0.585

0.104

0.182

0.534

0.367

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Lime Lemon Grapefruit Avocado Peanut

Ave

rag

e l

ipa

se

co

nc

en

tra

tio

n U

/10

0u

L

Fruits

Average lipase concentration in various fruits

No observable

difference between the

average lipase

concentration of lime

and avocado

No significant

difference between the

average lipase

concentration of lime

and avocado!!

Student’s Conclusion:

Lime has a significantly higher ?? lipase

concentration than the other fruit

samples.

Error Bars

Overlap – no observable difference

Overlap – no significant difference if

inferential stats is used

No overlap – observable difference

No overlap – significant difference is

inferential stats is used

Example 2: Is the average distance

travelled by the shuttlecock

significantly different among the

various shots? Trials Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Average

Shot 1 4.921 4.698 4.598 4.822 5.171 5.096 4.884

Shot 2 4.879 4.772 4.772 4.787 4.808 4.596 4.769

Shot 3 4.483 4.536 4.565 4.430 4.760 4.594 4.561

Shot 4 4.392 4.268 4.096 4.162 4.388 4.462 4.295

Shot 5 4.180 4.122 4.142 4.092 4.238 3.712 4.081

Shot 6 3.612 3.698 3.612 3.962 3.788 3.928 3.767

4.884 4.769 4.561

4.295 4.081

3.767

0

1

2

3

4

5

6

Shot 1 Shot 2 Shot 3 Shot 4 Shot 5 Shot 6

Ave

rag

e d

ista

nc

e (

m)

SHOTS

Average distance travelled by the shuttlecock for each of the six shots

Student’s Conclusion:

There is a significant difference?? in the

average distance travelled by the

shuttlecock among the six shots.

Statistical Significance

The results observed that are due to

REAL treatment effects and NOT due to

Chance.

The P-Value approach

P-Value – the probability of obtaining a

value which is different from what is

being hypothesized.

The smaller the P-Value, the more likely

the results are statistically significant.

So…what is the P-Value for a

statistically significant result?

Generally……

P < 0.05 (Results are statistically

significant)

P < 0.001 (Results are extremely

statistically significant)

Example 3: Is there a significant

difference in the absorbance of

reaction mixture of papain at various

concentration?

P = 0.04

There is a significant difference in the average absorbance among the three concentrations of Papain.

Concentrations

of Papain (%)

Absorbance readings

2 0.40 0.52 0.51 0.49 0.42

5 0.35 0.42 0.44 0.53 0.31

10 0.41 0.36 0.21 0.21 0.33

Various Statistical tools for

generating P-Values.

Statistical Analyses

Group comparisons

Establishing linear relationships

between variables

Group comparisons

2 groups

Sample size n = 5 - 15

Mann-Whitney U-Test

Sample size n > 15

T-Test

More than 2 groups

Sample size n = 5 - 15

Kruskal-Wallis K-Test

Sample size n > 15

ANOVA Post hoc test:

Multiple Comparisons


generating P-Values (I)

Example 4: An experiment was

conducted to find out if the survival of E.Coli differed between those grown using brass and glass pots.

Since there are two groups to be compared and n = 9, use the Mann-Whitney Test

Results: P > 0.05

There is no significant difference in the average number of bacterial colonies between the two samples

Number of

bacterial

colonies in each

brass pot

Number of

bacterial

colonies

present in each

glass pot

405 412

310 231

196 89

63 567

167 134

312 253

675 423

465 134

78 231

Example 5: Experiment to find out if temperature

readings differ among the various layers.

Since n < 5 for each group, non of our

statistical tools is appropriate for the

analysis.

Example 6: Experiment to find out if the mean concentration

of ethanol produced differed significantly between

the two methods.

If n > 15 for both groups, use T-Test set at α = 5%

Example 7: Experiment to find out if the mean

amount of ion adsorbed by mango

peels differed significantly among the

three groups.

3 T-Tests??

T-Test

P = 0.00003

T-Test

P = 0.00005

T-Test

P = 0.00379

Example 7:

For comparing more than 2 means with

n > 15 for each treatment group, use

ANOVA.

DO NOT USE MULTIPLE T-TESTS as

the error rate gets INFLATED!!

If ANOVA shows a significant difference

in the means among the groups, use

Tukey’s Multiple Comparisons to

determine where the difference lies.

Example 8: Experiment to determine if there is a significant

difference in the average acid concentration

among the four preparations.

Comparing averages among three or more

groups with 5 ≤ n ≤ 15 for each group.

Kruskal Wallis Test

Preparation A Preparation B Preparation C Preparation D

0.45 0.35 0.24 0.34

0.35 0.56 0.12 0.56

0.46 0.24 0.13 0.53

0.24 0.56 0.17 0.43

0.56 0.24 0.45 0.21

Establishing linear relationships

between variables

Functional dependence of one variable on another

Simple linear regression

Non dependence between variables

Simple linear correlation


generating P-Values (II)

Simple Linear Regression

Two variables

One variable (dependent/response variable)

depends on the other (independent/predictor

variable)

Represented by

scatterplots

Reported with

r2 and P-value

r2 and P-value in regression

analysis r2 – coefficient of determination

Measures how much of the variation in

the dependent variable is due to the

independent variable.

0% ≤ r2 ≤ 100%

P-Value – the probability of obtaining the

slope of the regression line if the actual

slope is zero.

Always report

both r2 and

P-value.

r2 and P-value in regression

analysis

Sample

slope

Population

slope

n= 5

r2 = 0.80

Simple Linear Correlation

Two variables

Neither of the two is functionally

dependent on the other

Represented by scatterplots

r (pearson correlation coefficient) –

measures the strength of linear

relationship between two variables.

Always report both r and P-value

Guidelines to interpreting r Coefficient, r

Strength of Association Positive Negative

Small .1 to .3 -0.1 to -0.3

Medium .3 to .5 -0.3 to -0.5

Large .5 to 1.0 -0.5 to -1.0

Caution……………………..

It is not appropriate to analyze a non-

linear relationship using Pearson

correlation coefficient

Example 10: Experiment to find out if there is a significant

correlation between percentage of DPPH

reacted and concentration of fruit peel extract.

•P-Value?

•Scatterplot?

1. The Don’ts……………………

For n < 5, DO NOT analyze your data

with inferential statistics.

E.g. Trying to determine if the amount of

heavy metal ion removed differed

among the three methods

Method 1 Method 2 Method 3

0.421 0.324 0.534

0.521 0.512 0.342

0.654 0.526 0.523

Con

ce

ntra

tion

of

he

avy m

eta

l ion

rem

ove

d

2. The Don’ts………………

When no statistical analysis is being

performed on the data sets, refrain from

using the word ‘Significant’!

You can however claim that ‘there is an

observable difference…’

3. The Don’ts…………………

Data analyses DO NOT PROVE

hypotheses.

The results either support or do not

support the hypotheses.

Refrain from using the word ‘Prove’ or

Discover!!

3. The Don’ts…………………

Do not attempt to analyze too many variables at

the same time!

Analyses of multiple variables at the same time

Multivariate Statistical Analyses!!

The Dos…………

Decide on the appropriate significance level before statistical analyses (e.g. 5%)

Always factor in the appropriate statistical tool for analyzing your data at the planning stage

Always report your significance level and P-value!

Consult your treachers or Mr Law if you have any queries

Basic data analyses skills for science research

Education

Transcript of Basic data analyses skills for science research