AoE RPG Workshop - Basic Statistics for Research

8/11/2019 AoE RPG Workshop - Basic Statistics for Research

1/106


2/106

Why do we

need statistics?


3/106

Statistics

Derived from the Latin for state - governmentaldata collection and analysis.

Study of data (branch of mathematics dealing

with numerical facts i.e. data). The analysis and interpretation of data with a

view toward objective evaluation of the

reliability of the conclusions based on the

data.

Three major types:Descriptive, Inferential and

Predictive Statistics


4/106

Variation- Why statistical

methods are neededhttp://www.youtube.com/watch?v=fsRY

kRqQqgg&feature=related

By UCMSCI
http://www.youtube.com/watch?v=fsRYkRqQqgg&feature=relatedhttp://www.youtube.com/watch?v=fsRYkRqQqgg&feature=related


5/106

3 Major Types of Stats

Descriptive statistics (i.e., data distribution

central tendency and data dispersion)

Inferential statistics (i.e., hypothesistesting)

Predictive statistics (i.e., modelling)


6/106

http://www.censtatd.gov.hk

Descriptive Stats


7/106

From observation to scientific questioning:

Why do females generally live longer than males in

human and other mammals?

Setting hypothesis (theory) for testing:

Hypothesis: Metabolic rate of males is faster than that

of females, leading to shorter life span in males.

Hypothesis: Males consume more food than females,

leading to a higher chance of exposure to toxic

substances.

Inferential StatsHypothesis Testing


8/106

A Hypothesis

A statement relating to an observation that

may be true but for which a proof (or

disproof) has not been found. The results of a well-designed experiment

may lead to the proof or disproof of a

hypothesis (i.e. accept or reject of thecorresponding null hypothesis).

Inferential Stats


9/106

For example, Heights of male vs. female at age of 30.Our observations: male H > female H; it may be

linked to genetics, consumption and exercise etc.

Is that true for the hypothesis (HA): male H > female H?A corresponding Null hypothesis (Ho): male H female H

Scenario I: Randomly select 1 person from each sex.Male: 170Female: 175Then, Female H> Male H. Why?

Scenario II: Randomly select 3 persons from each sex.Male: 171, 163, 168Female: 160, 172, 173

What is your conclusion then?

Inferential Stats


10/106

Population

Samples

Sub-samples

Inferential Stats


11/106

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

140 150 160 170 180 190

Height (cm)

Probability

density

Inferential Stats

After taking 100 random samples, the

two distributions are uncovered.


12/106

Important Take-Home Messages:

(1) Sample size is very important and will affect yourconclusion.

(2) Measurement results vary among samples (orsubjects)that is variation or uncertainty.

(3) Variation can be due to measurement errors(random or systematic errors) and variation

inherent within samples; e.g., at age 30, femaleheight varies between 148 and 189 cm. Why?

(4) Therefore, we always deal with distributions ofdatarather than a single point of measurementor event.

Inferential Stats


13/106

How many samples are needed?

Mean values

Sample size

True mean

Minimum

sample

size

*Assuming data follow the normal distribution


14/106

Determine the minimum sample size by plotting

the running means

Stabilization of mean and SD

I f i l S


15/106Zimmer 2001

1 2 3 4 5 6 7

Which one do you prefer

Inferential Stats

I f ti l St t


16/106

1 2 3 4

Zimmer 2001

We can infer if the observed preference frequencies are identical to

the hypothetical preference frequencies (e.g. 1:2:10:11:3:2:1) using

a Chi-square test.

Chi-square = (Oi-Ei)2/Ei

Inferential Stats

I f ti l St t H th i T ti


17/106

Inferential StatsHypothesis Testing

Ho 1:The water sample A is cleaner than the water

sample B in terms of E. colicount.

Ho 2:Water quality in Site A is better than Site B interms of E. colicount during the swimming

season.

Ho 3:Water quality in Site A is better than Site B interms of E. colicount at all times.

How can we test the following hypotheses?

P di ti St t


18/106

b: Sullivans method c: A regression model

Predictive Stats

P di ti St t


19/106

Predictive Stats

Source: Hong Kong Observatory


20/106

Basic Descriptive Statistics


21/106

Measurement Theory

Environmental scientists use measurements

routinely in Lab or field work by assigning

numbers or groups (classes).

Mathematical operations may be applied to

the data, e.g. predicting fish mass by their

length through an established regression

Different levels of measurements:

nominal, ordinal, interval scale, ratio


22/106

1

2

3 0 1 10 100 1000

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Nominal

Ordinal

Scale


23/106

How to Describe the Data Distribution?

Central tendency

Mean for normally distributed data

Median for non-normally distributed data

Dispersal pattern

Standard deviation for normally distributeddata

Range and/or Quartiles for non-normallydistributed data


24/106

Normality Check

Frequency histogram

(Skewness & Kurtosis)

Probability plot, K-S

test

Descriptive

statistics

Measurements

(data)

Mean, SD, SEM,95% confidence

interval

YES

Check the

Homogeneity

of Variance

Data transformation

NO

Data transformation

NO

Median, range,Q1 and Q3

Non-Parametric

Test(s)

For 2 samples: Mann-

Whitney

For 2-paired samples:

Wilcoxon

For >2 samples:

Kruskal-Wallis

Sheirer-Ray-Hare

Parametric Tests

Students t tests for

2 samples; ANOVA

for 2 samples; posthoc tests for

multiple comparison

of means

YES

alls Flowchart


25/106

Measurements of Central Tendency

Mean = Sum of values/n = Xi/ne.g. length of 8 fish larvae at day 3 after hatching:

0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

mean length = (0.6+0.7+1.2+1.5+1.7+2.0+2.2+2.5)/8

= 1.55 mm

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

mean


26/106

Median, Percenti les and Quart i les Order = (n+1)/2

e.g. 0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

order 1 2 3 4 5 6 7 8

order = (8+1)/2 = 4.5

Median = 50th percentile = (1.5 + 1.7)/2 = 1.6 mm

order for Q1 = 25th percentile = (8+1)/4 = 2.25

then Q1 = 0.7 + (1.2 - 0.7)/4 = 0.825 mm

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

mean

median


27/106

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

mean

median

Median is often used with mean.

Mean is, however, used much more frequent.

Median is a better measure of central tendency for data

with skewed distribution or outliers.

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

mean

median


28/106

Other Measures of Central Tendency

Range midpoint or range = (Max value - Min value)/2

not a good estimate of the mean and seldom-used

Geometric mean = n(x1x2x3x4.xn)= 10^[mean of log10(xi)]

Only for positive ratio scaledata

If data are not all equal, geometric mean < arithmetic mean

Use in averaging ratios where it is desired to give each ratio

equal weight


29/106

Measurements of Dispersion

Rangee.g. length of 8 fish larvae at day 3 after hatching:

0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

Range = 2.5 - 0.6 = 1.9 mm (or say from 0.6 t 2.5mm)

Percent i le and quart iles


30/106

Population Standard Deviation (

)

Averaged measurement of deviation from mean

xi- x

e.g. five rainfall measurements, whose mean is 7

Rainfall (mm) xi- x (xi- x)2

12 12 - 7 = 5 25

0 0 - 7 = -7 49

2 2 - 7 = -5 25

5 5 - 7 = -2 4

16 16 - 7 = 9 81

Sum = 184 Sum = 184 Population variance: 2= (xi- x)2/n= 184/5 = 36.8 Population SD: = (xi- x)2/n= 6.1


31/106

Sample SD (s)

s= [(xi- x)2]/ (n- 1)

s= [xi2((xi)2 /n)]/ (n- 1)

Two modifications:

by dividing [(xi- x)2] by (n-1) rather than n, gives a betterunbiased estimate of (however, when nincreases,difference between sand declines rapidly)

the sum of squared (SS) deviations can be calculated as

(xi2)- (xi)2/ n


32/106

Sample SD (s)

e.g. five rainfall measurements, whose mean is 7.0

Rainfall (mm) xi2 xi

12 144 12

0 0 0

2 4 2

5 25 516 256 16

(xi2) = 429 xi= 35(xi)2= 1225

s2= [xi2- (xi)2 /n]/ (n - 1) = [429 - (1225/5)]/ (5 - 1)= 46.0 mm

s = (46.0) = 6.782 = 6.8 mm


33/106

Basic Experimental Design for

Environmental Research

1. Setting environmental questions into statisticalquestions [e.g. spatial and temporal variations]

2. Setting hypotheses and then statistical null hypotheses

4. Statistical consideration (treatment groups, sample size,

true replication, confounding factors etc.)5. Sampling design (independent, random, samples)

6. Data collection & measurement (Quality Control andQuality Assurance Procedures)

7. Data analysis

Too few data: cannot obtain reliable conclusions

Too many data: extra effort (time and money) indata collection


34/106

Generalized scheme of logical components of a

research programme (Underwood 1997)

Weapon size versus body

size as a predictor ofwinning fights

Carcinus maenas

Reference: Sneddon et

al. 1997, in

Behav. Ecol . Sociobiol.

41: 237 - 242

Reject Ho

Supporthypothesisand model

Retain HoRefutehypothesisand model

Interpretation

Don't end here

Experiment

Critical test of null hypothesis

Null Hypothesis

Logical opposite to hypothesis

HypothesisPredictions based on model

Models

Explanations or theories

Start hereObservations

Patterns in space or time


35/106

Randomized Sampling

Lucky Draw Concept

To randomly select 30 out of 200 sampling stations inHong Kong waters, you may perform a lucky draw.

So, the chance for selecting each one of them for eachtime of drawing would be more or less equal (unbiased).It can be done with or without replacement.

Sampling with Transects and a Random Number Table

Randomly lay down the transects based on random nos.

Randomly take samples along each transect.


36/106

Randomized Sampling

Spatial ComparisonClustered RandomSampling

S1 S2 S3 S4 S5 S6 S7

S8 S9 S10 S11 S12 S13 S14

S15 S16 S17 S18 S19 S20 S21

Temporal Comparison

Wet Season vs. Dry Season

Randomly select sampling days within eachseason (assuming each day is independent toother days) covering both neap and spring tides.

Transitional period should not be selected toensure independency of the two seasons.

Randomly take

e.g. 10 samples

from each

randomlyselected site


37/106

Study Sites (HK map)


38/106

Spatial


39/106


40/106

Stratified (Random) Sampling

The population is first divided into a number of parts or'strata' according to some characteristic, chosen to be

related to the major variables being studied.

Water samples from three different water depths (1 m from the

surface, mid-depth, 1 m above seabed).

Water samples from a point source of pollution using a transect

(set away from the source to open sea) with fixed sampling

intervals (e.g. 1, 5, 10, 20, 50, 100, 500, 1000, 2000 m).

Sediment samples from the high (2 m of Chart Datum), mid (1 mCD) and low intertidal zones (0.5 m CD).

Sediment and water samples from different beneficial uses in

Hong Kong waters.


41/106

Precision and Accuracy

Neitherprecise

nor

accurate

Highly

precise

but not

accurate

Moderately

precise

and

accurate

Highly

preciseand

accurate


42/106

Quality Control & Quality Assurance

e.g. Total phosphate measurements for a water sample

Step 1: Pipette 1 ml

sample to a cuvette

Step 2: Pipette 0.5 ml

colour reagent

Step 3: Reaction for

15 minutes

Conc.

Abs

Accuracy can be

checked with certified

standard reference

solutions.

Precision can be

estimated using

procedure replicates.


43/106

Lead 0.065 0.007

QC & QA:

Control Chart

The measured mean value can be

compared with the certified mean

value using one-sample t-test.


44/106

Why is it so important to use the mean of the

means in the experimental design?

Central Limit Theorem

The mean will remain the same if a

mean of the means is used instead of

taking a simple mean but the SD of the

means will be substantially smallerthan the original sample SD.

For each water body, 50 samples are

taken. It is advantageous if they are

grouped into 5 groups of 10 samples tocompute the mean of the means. This

will increase the power for subsequent

comparison with other sites.


45/106

True Replication vs. Pseudo- Replication

Treatment BControl Treatment A

Will it be correct to say that there are four replicatesper group? If not, why?


46/106


Treatment BControl Treatment A

Will it be correct to say that there are three replicates pergroup? If yes, why?

Mean 1 Mean 2 Mean 3

With the same replication

arrangement as those in

the Control.

Strom drainageWave breaker


47/106

How can we obtain a statistically soundfigure of E. co licount for this bathing

beach?

Strom drainage

outfallA Bathing BeachWave breaker

Sea


48/106


Site CSite A Site B

Five replicates per group and each replicate with threeprocedure replicates to ascertain the measurement precision.

With the same replication

arrangement as those in

the Site A.


49/106


Site A

Three replicated sites per site, each replicated site with threereplicate samples and each sample with three procedure

replicates to ascertain the measurement precision.

3 Sub-sites


50/106

Inferential Statistics

Frequency Distribution


51/106

e.g. The particle sizes (m) of 37 grainsfrom a sample of sediment from an

estuary

8.2 6.3 6.8 6.4 8.1 6.3

5.3 7.0 6.8 7.2 7.2 7.1

5.2 5.3 5.4 6.3 5.5 6.0

5.5 5.1 4.5 4.2 4.3 5.14.3 5.8 4.3 5.7 4.4 4.1

4.2 4.8 3.8 3.8 4.1 4.0 4.0

Defineconvenient

classes (equal

width) and

class intervalse.g. 1 m

Sediment grain sizes

Frequency Distribution


52/106

e.g.A frequency distribution table for the size of particles

collected from the estuary

Particle size (m) Frequency3.0 to under 4.0 2

4.0 to under 5.0 125.0 to under 6.0 10

6.0 to under 7.0 7

7.0 to under 8.0 48.0 to under 9.0 2


53/106

e.g. A frequency distribution for the size of particles

collected from the estuary

Frequency Histogram

3 to


54/106

>149-153>153-157>157-161>161-165>165-169>169-173>173-177>177-181>181-1850

2

4

6

8

10

12

14

Height (cm)

Frequen

cy

e.g.A frequency distribution of height of the 30

years old people (n = 52: 30 females & 22 males)

Why bimodal-like ?

Th N l C


55/106

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

140 150 160 170 180 190

Height (cm)

Probability

dens

ity

Parameters and determinethe position of the curve on the

x-axis and its shape.

Normal curve was firstexpressed on paper (for

astronomy) by A. de Moivre in

1733.

Until 1950s, it was then appliedto environmental problems.

(P.S. non-parametric statistics

were developed in the 20th

century)

female

male

The Normal Curve f(x) = [1/(2)]exp[(x )2/(22)]


56/106

The Standard Normal Curve with a Mean = 0


57/106

= 0, = 1 and with the total area under the curve = 1 units along x-axis are measured in units Figures: (a) for 1 , area = 0.6826 (68.26%); (b) for 2

95.44%; (c) the shaded area = 100% - 95.44%

The Standard Normal Curve with a Mean = 0

(Pentecost 1999)


58/106

Alternatively, we can state the null hypothesisas that a random observation ofZwill lie

outside the limit -1.96 or +1.96.

There are 2 possibilities:

Either we have chosen an unlikely value

ofZ, or our hypothesis is incorrect.

Conventionally, when performing a

significant test, we make the rule that if

Z values lies outside the range 1.96, then the null hypothesis is rejected andtheZvalue is termedsignificant at the 5% level or = 0.05 (or p < 0.05) -critical value of the statistics.

ForZ= 2.58, the value is termed significant at the 1% level.

Inferential statistics - testing the null hypothesis


59/106

Accept Ho Reject Ho

Accept Ho Reject Ho


60/106


61/106

Statistical Errors in Hypothesis Testing

Consider court judgements where the accused is

presumed innocent until proved guilty beyond

reasonable doubt (I.e. Ho = innocent).

If the accused is

truly innocent(Ho is true)

If the accused is

truly guilty(Ho is false)

Courts

decision:Guilty

Wrongjudgement

OK

Courts

decision:Innocent

OK Wrongjudgement


62/106



63/106


For example, Ho: The average ammonia concentrations are similar

between the suspected polluted Site A and the reference cleanSite B, i.e. A = B

If Ho is indeed a true statementabout a statistical population,

it will be concluded (erroneously) to be false 5% of time (in case

= 0.05).

Rejection of Ho when it is in fact true is a Typ e I erro r (also

called an error).

If Ho is indeed false, our test may occasionally not detect thisfact, and we accept the Ho.

Acceptance of Ho when it is in fact false is a Type II erro r

(also called a error).

Minimization of Type II error is vitally essential for environmental management.


64/106

Power of a Statistical Test

Power is defined as 1-. is the probability to have Type II error.

Power (1- ) is the probability of rejectingthe null hypothesis when it is in fact false

and should be rejected.

Probability of Type I error is specified as . But is a value that we neither specify nor

known.


65/106


66/106

What is next?

1. Group Discussion on the Experimental

Design for a Case Study

2. Introduction to Two Classes of Basic

Statistical Techniques:

(1) correlation based methods and

(2) group comparison methods

3. Power Analysis

Measurements

(data)


67/106

Normality Check

Frequency histogram



test

Descriptive

statistics

(data)

Mean, SD, SEM,

95% confidence

interval

YES

Check the

Homogeneity

of Variance

Data transformation

NO

Data transformation

NO

Median, range,

Q1 and Q3

Non-Parametric

Test(s)


Whitney


Wilcoxon

For >2 samples:Kruskal-Wallis

Sheirer-Ray-Hare

Parametric Tests


2 samples; ANOVA


multiple comparison

of means

YES

Ball

alls Flowchart


68/106

A Comparing Two Samples

Independent Samples t test

B Comparing More than 2 Samples

Analysis of Variance (ANOVA)

Power Analysis with G*Power


69/106

G*Power 3Free Software

http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/


70/106

Photo source: http://www-groups.dcs.st-and.ac.uk/~history/PictDisplay/Gosset.html

Mr. Student = Mr. William Sealey Gosset (18761937)

Measurements

(data)


71/106

Normality Check

Frequency histogram



test

Descriptive

statistics

(data)

Mean, SD, SEM,

95% confidence

interval

YES

Check the

Homogeneity

of Variance

Data transformation

NO

Data transformation

NO

Median, range,

Q1 and Q3

Non-Parametric

Test(s)


Whitney


Wilcoxon


Sheirer-Ray-Hare

Parametric Tests


2 samples; ANOVA


multiple comparison

of means

YES

Ball

alls Flowchart

The difference between two sample


72/106

The difference between two sample

means with limited data

If n


73/106

A B

Meanmeasuredunit

Comparison of 2 Independent Samples

C i f 2 I d d t S l


74/106

Error bar = 95% C.I.

A B

Measuredunit


C i f 2 I d d t S l


75/106

A B

Measuredunit





76/106

A B

Measured

unit



Power and sample size for Students t test


77/106

Power and sample size for Student s ttest

We can estimate the minimum sample size to use to

achieve desired test characteristics: n (2SP

2/2)(t,+ t(1),)

2

where is the smallest population difference we wishto detect: = 1- 2

Required sample size depends on , populationvariance(2), , and power(1-)

If we want to detect a very small , we need a largersample.

If the variability within samples is great, a large nisrequired. The results of pilot study or pervious studyof this type would provide such an information.

E ti ti f i i d t t bl diff


78/106

Estimation of minimum detectable difference

n (2SP2/2)(t,+ t(1),)2

The above equation can be rearranged to

ask how small a population difference ()is detectable with a given sample size:

[(2SP2/n)](t,+ t(1),)

S N t b t Eff t Si


79/106

If aliens were to land on earth, howlong would it take for them torealise that, on average, humanmales are taller than females?

The answer relates to the effectsize (ES) of the difference in heightbetween men and women.

The larger the ES, the quicker theywould suspect that men are taller.

Cohen (1992) suggested where 0.2 isindicative of a small ES, 0.5 amedium ES and 0.8 a large ES.

Some Notes about Effect Size

http://spss.wikia.com/wiki/Sample_Size,_Effect_Size,_and_Power

ww

w.myspace.com

/mtkchronicles

A Students tTest with na= nb

Example 1
http://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/jimmywenger


80/106

e.g. The chemical oxygen demand (COD) is measured at two

industrial effluent outfalls, a and b, as part of consent procedure.

Test the null hypothesis: Ho: a= bwhile HA: ab

a b

3.48 3.89

2.99 3.19

3.32 2.80

4.17 4.31

3.78 3.42

4.00 3.41

3.20 3.55

4.40 2.40

3.85 2.99

4.52 3.08

3.09 3.31

3.62 4.52

n 12 12

mean 3.701 3.406

S2 0.257 0.366

sp2= (SS1+ SS2) / (1+ 2)= [(0.25711) + (0.366 11)]/(11+11)

= 0.312

sX1X2= (sp2/n1+ sp

2/n2)= (0.312/12)2 = 0.228

t = (X1X2) /sX1X2=(3.7013.406) / 0.228 = 1.294

df = 2n- 2 = 22

t= 0.05, df = 22, 2-tailed= 2.074 > t observed= 1.294,p> 0.05

The calculated t-value < the critical tvalue.

Thus, accept Ho.

Need to check Power.

SS = sum of square = S2

Remember to always check the homogeneity of variance before running the ttest.

Example 1


81/106

Example 1


82/106

N = 2 x 48 = 96


83/106

Ho: transgenicnon-transgenic Example 2


84/106

HA: transgenic> non-transgenicGiven that mass (g) of tilapia are normally distributed.

n 8 8

mean 625.0 306.25

S2 6028.6 5798.2

transgenic non-transgenic700 305

680 280

500 275

510 250

670 490

670 275

620 275650 300

sp2= (SS1+ SS2) / (1+ 2)= 5913.4

sX1X2= (sp2/n1+ sp

2/n2)= 38.45

t = (X1X2) /sX1X2=8.29

df = 2n- 2 = 14

t= 0.05, df = 14, 1-tailed= 1.761


85/106

Example 2


86/106

N = 2 x 3 = 6

Comparison of [PBDEs] in tissues oft l t d l ll t d f 6 it

Example 3


87/106

transplanted mussels collected from 6 sitesalong a anticipated pollution gradient

Expected that high[PBDEs] in samples from

polluted sites than clean

sites

Ha: unequal means Ho: equal means

[PBDEs] in mussels from various sites

(ng/g)

P1 P2 P3 P4 C5 C6

4.25 3.50 7.20 4.00 0.50 2.50

3.45 3.80 6.50 5.50 5.50 2.50

4.75 4.70 4.00 2.20 2.25 2.30

5.60 1.01 2.20 1.70 3.00 3.30

3.20 6.00 3.50 6.00 5.00 4.50

Example 3

Comparison of [PBDEs] in tissues of transplanted mussels collected


88/106

ANOVA

Source of Variation SS df MS F P-value

Between Groups 9.465417 5 1.893083 0.650981 0.663547

Within Groups 69.79308 24 2.908045 common SD 1.705299

Total 79.2585 29

Comparison of [PBDEs] in tissues of transplanted mussels collectedfrom 6 sites along a anticipated pollution gradient

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

1 2 3 4 5 6

Sites

[PBDEs]inmus

sels(ng/g

= 2.908

Example 3


89/106

Example 3


90/106

N = 6 x 21 = 126

2-Way ANOVA: Effects of dietary PCBsand sex on heart rate in birds

Example 4


91/106

and sex on heart rate in birds

There was a significant effectof chemical treatment on theheart rate in the birds (P 0.05

PCB x Sex 4.900 1 4.90 0.21 4.49 > 0.05

Within cells (error) 366.4 16 22.90

0

5

10

15

20

25

30

35

40

Heartrate(beat/min)

Female

Male

Control PCB treated

Example 4


92/106

Source of variance SS DF MS = SS/DF F F critical, 0.05(1), 1, 16 P

Total 1827.7 19

Cells 1461.3 3

PCB 1386.1 1 1386.10 60.53 4.49 < 0.001

Sex 70.31 1 70.31 3.07 4.49 > 0.05 PCB x Sex 4.900 1 4.90 0.21 4.49 > 0.05

Within cells (error) 366.4 16 22.90

For the sex effect

Variance for sex = 70.31

Error variance = 22.90

N should be 2 x 2 x 7 = 28

Measurements

(data)


93/106

Normality Check

Frequency histogram



test

Descriptive

statistics

Mean, SD, SEM,

95% confidenceinterval

YES

Check the

Homogeneity

of Variance

Data transformation

NO

Data transformation

NO

Median, range,

Q1 and Q3

Non-Parametric

Test(s)


Whitney


Wilcoxon


Sheirer-Ray-Hare

Parametric Tests


2 samples; ANOVA


multiple comparison

of means

YES

Alternatives to Hypothesis

testing exist

Due to

shortcomings of

inferential stats


94/106

There are problems in the conventional hypothesis testing:

http://www.youtube.com/watch?v=ez4DgdurRPg
http://www.youtube.com/watch?v=ez4DgdurRPghttp://www.youtube.com/watch?v=ez4DgdurRPg


95/106

YouTube - Bayes' Formula

http://www.youtube.com/watch?v=pPTLK5hF

GnQ&feature=channel

By bionicturtledotcom
http://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channel


96/106

YouTube - Bayes' Theorem -

Part 2http://www.youtube.com/watch?v=bcA

LcVmLva8&feature=related

By westofvideo

A Simple Example
http://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=related


97/106

8 Sick 2 Fine 95 Sick 895 Fine

10 Exposed

1000 People

990 Non-Exposed

What is the chance to be sick after eating scallops (i.e. exposed)?

Probability = 8 exposed with illness/(total of 103 with illness)

= 0.078

1000 P l (P 1)

A Probability DiagramBayesian Approach


98/106

P=0.800

Sick

P=0.200

Fine

P=0.096

Sick

P=0.904

Fine

P=0.010 Exposed

1000 People (P=1)

P=0.990 Non-Exposed

What is the chance to be sick after eating scallops (i.e. exposed)?

P(Exposed

Sick) =

P(Exposed) P(Sick

Exposed)

P(Sick)

=(0.010)(0.800)

(0.010*0.800+0.990*0.096)= 0.078

Example: Fishkills


99/106

This figure illustrates how

the natural frequency

approach can lead tothese same inferences

using the p(Pfiesteria)

estimate of 0.205. From

the figure, the likelihood

ratio can be calculated.

Mike Newman, et al. 2007.

Coastal and estuarine

ecological risk

assessment: the need fora more formal approach

to stressor identification.

Hydrobiologia 577: 31-40.

LargeFish Kill

HighPfiesteria

HighPfiesteria

LargeFish Kill

LowOxygen

LowOxygen

Yes0.081

(810)

No

0.919(9190)

Yes0.520

No0.480

Yes0.205

No0.795

421 Casesof Kills with Pfiesteria

389 Cases

of Kills without Pfiesteria

1884 Cases of no Killswith Pfiesteria

7306 Cases

of no Killswithout Pfiesteria

Yes0.081(810)

No0.919

(9190)

Yes0.220 No

0.780Yes0.095

No0.905

178 Casesof Kills with Low DO

632 casesof Kills without Low DO

873 Casesof no Kills

with Low DO

8317 Casesof no Kills without Low DO

Example: Fishkills

Credit: M.C. Newman

22346.0arg1884

arg421

levelsPfiesteriahighwithkillsfishelnoofCases

levelsPfiesteriahighwithkillsfishelofCases


100/106

096.120389.0

22346.0

20389.0arg873

arg178

g

RatioLikelihood

ionsconcentratoxygendissolvedlowwithkillsfishelnoofCases

ionsconcentratoxygendissolvedlowwithkillsfishelofCases

fgff

095.1)|(

)|(

DOLowKillFishp

PfiesteriaKillFishp

LargeFish Kill

High

Pfiesteria

High

Pfiesteria

LargeFish Kill

LowOxygen

LowOxygen

Yes0.081(810)

No0.919(9190)

Yes0.520

No0.480

Yes0.205

No0.795

421 Casesof Kills with Pfiesteria

389 Casesof Kills without Pfiesteria

1884 Cases of no Killswith Pfiesteria

7306 Cases of no Killswithout Pfiesteria

Yes0.081(810)

No0.919(9190)

Yes0.220 No

0.780Yes0.095

No0.905

178 Casesof Kills with Low DO

632 cases

of Kills without Low DO

873 Cases

of no Killswith Low DO

8317 Casesof no Kills without Low DO

Credit: M.C. Newman

Urbanization


101/106

Fish ageFish sex

Fish liver

lesions

Sediment

concentrations-inorganics

Fish

mortality

Sediment

concentrations-organochlorines

(DDTs, chlordane)

Sediment

concentrations-PAHs

Stomach

concentrations-

Inorganics

Stomach

concentrations-

PAHs

Stomach

concentrations-

organochlorines

Fish liver tissue

concentrations-

inorganics

Fish liver tissue

concentrations-

PAHs

Fish liver tissue

concentrations-

organochlorines

English sole (Pleuronectes vetulus) from Puget Sound

Marine Environmental Research 45: 47-67 (1998).

Credit: M.C. Newman

Software Exists for More Complex Situations


102/106

Credit: M.C. Newman


103/106

Credit: M.C. Newman

Supplemental Readings


104/106

pp gAven, T. & J.T. Kvaly, 2002. Implementing the Bayesian paradigm in risk analysis.

Reliability Engineering and System Safety78: 195-201.

Bacon, P.J., J.D. Cain & D.C. Howard, 2002. Belief network models of land manager

decisions and land use change. Journal of Environmental Management65: 1-23.Belousek, D.W., 2004. Scientific consensus and public policy: the case of Pfiesteria.

Journal Philosophy, Science & Law4: 1-33.

Borsuk, M.E., 2004. Predictive assessment of fish health and fish kills in the Neuse

River estuary using elicited expert judgment, Human and Ecological Assessment

10: 415-434.

Borsuk, M.E., D. Higdon, C.A. Stow & K.H. Reckhow, 2001. A Bayesian hierarchical

model to predict benthic oxygen demand from organic matter loading in estuaries

and coastal zones. Ecological Modelling143: 165-181.

Garbolino, P. and F. Taroni. 2002. Evaluation of scientific evidence using Bayesian

networks. Forensic Sci Intern.125:149-155.

Newman, M.C. and D. Evans. 2002. Causal inference in risk assessments: Cognitive

idols or Bayesian theory? In:Coastal and Estuarine Risk Assessment. CRC Press

LLC, Boca Raton, FL, pp. 73-96.Newman, M.C., Zhao, Y., and J.F. Carriger. 2007. Coastal and estuarine ecological

risk assessment: the need for a more formal approach to stressor identification.

Hydrobiologia577: 31-40.

Uusitalo, L. 2007. Advantages and challenges of Bayesian networks in environmental

modeling. Ecol. Modelling203:312-318.

Credit: M.C. Newman


105/106

YouTube - Bayes' Theorem

Introductionhttp://www.youtube.com/watch?v=0NG

mrwu_BkY&feature=related

By westofvideo
http://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=related


106/106

AoE RPG Workshop - Basic Statistics for Research

Documents

Transcript of AoE RPG Workshop - Basic Statistics for Research