AoE RPG Workshop - Basic Statistics for Research

download AoE RPG Workshop - Basic Statistics for Research

of 106

Transcript of AoE RPG Workshop - Basic Statistics for Research

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    1/106

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    2/106

    Why do we

    need statistics?

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    3/106

    Statistics

    Derived from the Latin for state - governmentaldata collection and analysis.

    Study of data (branch of mathematics dealing

    with numerical facts i.e. data). The analysis and interpretation of data with a

    view toward objective evaluation of the

    reliability of the conclusions based on the

    data.

    Three major types:Descriptive, Inferential and

    Predictive Statistics

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    4/106

    Variation- Why statistical

    methods are neededhttp://www.youtube.com/watch?v=fsRY

    kRqQqgg&feature=related

    By UCMSCI

    http://www.youtube.com/watch?v=fsRYkRqQqgg&feature=relatedhttp://www.youtube.com/watch?v=fsRYkRqQqgg&feature=related
  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    5/106

    3 Major Types of Stats

    Descriptive statistics (i.e., data distribution

    central tendency and data dispersion)

    Inferential statistics (i.e., hypothesistesting)

    Predictive statistics (i.e., modelling)

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    6/106

    http://www.censtatd.gov.hk

    Descriptive Stats

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    7/106

    From observation to scientific questioning:

    Why do females generally live longer than males in

    human and other mammals?

    Setting hypothesis (theory) for testing:

    Hypothesis: Metabolic rate of males is faster than that

    of females, leading to shorter life span in males.

    Hypothesis: Males consume more food than females,

    leading to a higher chance of exposure to toxic

    substances.

    Inferential StatsHypothesis Testing

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    8/106

    A Hypothesis

    A statement relating to an observation that

    may be true but for which a proof (or

    disproof) has not been found. The results of a well-designed experiment

    may lead to the proof or disproof of a

    hypothesis (i.e. accept or reject of thecorresponding null hypothesis).

    Inferential Stats

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    9/106

    For example, Heights of male vs. female at age of 30.Our observations: male H > female H; it may be

    linked to genetics, consumption and exercise etc.

    Is that true for the hypothesis (HA): male H > female H?A corresponding Null hypothesis (Ho): male H female H

    Scenario I: Randomly select 1 person from each sex.Male: 170Female: 175Then, Female H> Male H. Why?

    Scenario II: Randomly select 3 persons from each sex.Male: 171, 163, 168Female: 160, 172, 173

    What is your conclusion then?

    Inferential Stats

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    10/106

    Population

    Samples

    Sub-samples

    Inferential Stats

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    11/106

    0.00

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.10

    140 150 160 170 180 190

    Height (cm)

    Probability

    density

    Inferential Stats

    After taking 100 random samples, the

    two distributions are uncovered.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    12/106

    Important Take-Home Messages:

    (1) Sample size is very important and will affect yourconclusion.

    (2) Measurement results vary among samples (orsubjects)that is variation or uncertainty.

    (3) Variation can be due to measurement errors(random or systematic errors) and variation

    inherent within samples; e.g., at age 30, femaleheight varies between 148 and 189 cm. Why?

    (4) Therefore, we always deal with distributions ofdatarather than a single point of measurementor event.

    Inferential Stats

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    13/106

    How many samples are needed?

    Mean values

    Sample size

    True mean

    Minimum

    sample

    size

    *Assuming data follow the normal distribution

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    14/106

    Determine the minimum sample size by plotting

    the running means

    Stabilization of mean and SD

    I f i l S

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    15/106Zimmer 2001

    1 2 3 4 5 6 7

    Which one do you prefer

    Inferential Stats

    I f ti l St t

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    16/106

    1 2 3 4

    Zimmer 2001

    We can infer if the observed preference frequencies are identical to

    the hypothetical preference frequencies (e.g. 1:2:10:11:3:2:1) using

    a Chi-square test.

    Chi-square = (Oi-Ei)2/Ei

    Inferential Stats

    I f ti l St t H th i T ti

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    17/106

    Inferential StatsHypothesis Testing

    Ho 1:The water sample A is cleaner than the water

    sample B in terms of E. colicount.

    Ho 2:Water quality in Site A is better than Site B interms of E. colicount during the swimming

    season.

    Ho 3:Water quality in Site A is better than Site B interms of E. colicount at all times.

    How can we test the following hypotheses?

    P di ti St t

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    18/106

    b: Sullivans method c: A regression model

    Predictive Stats

    P di ti St t

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    19/106

    Predictive Stats

    Source: Hong Kong Observatory

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    20/106

    Basic Descriptive Statistics

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    21/106

    Measurement Theory

    Environmental scientists use measurements

    routinely in Lab or field work by assigning

    numbers or groups (classes).

    Mathematical operations may be applied to

    the data, e.g. predicting fish mass by their

    length through an established regression

    Different levels of measurements:

    nominal, ordinal, interval scale, ratio

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    22/106

    1

    2

    3 0 1 10 100 1000

    0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

    Nominal

    Ordinal

    Scale

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    23/106

    How to Describe the Data Distribution?

    Central tendency

    Mean for normally distributed data

    Median for non-normally distributed data

    Dispersal pattern

    Standard deviation for normally distributeddata

    Range and/or Quartiles for non-normallydistributed data

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    24/106

    Normality Check

    Frequency histogram

    (Skewness & Kurtosis)

    Probability plot, K-S

    test

    Descriptive

    statistics

    Measurements

    (data)

    Mean, SD, SEM,95% confidence

    interval

    YES

    Check the

    Homogeneity

    of Variance

    Data transformation

    NO

    Data transformation

    NO

    Median, range,Q1 and Q3

    Non-Parametric

    Test(s)

    For 2 samples: Mann-

    Whitney

    For 2-paired samples:

    Wilcoxon

    For >2 samples:

    Kruskal-Wallis

    Sheirer-Ray-Hare

    Parametric Tests

    Students t tests for

    2 samples; ANOVA

    for 2 samples; posthoc tests for

    multiple comparison

    of means

    YES

    alls Flowchart

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    25/106

    Measurements of Central Tendency

    Mean = Sum of values/n = Xi/ne.g. length of 8 fish larvae at day 3 after hatching:

    0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

    mean length = (0.6+0.7+1.2+1.5+1.7+2.0+2.2+2.5)/8

    = 1.55 mm

    0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

    mean

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    26/106

    Median, Percenti les and Quart i les Order = (n+1)/2

    e.g. 0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

    order 1 2 3 4 5 6 7 8

    order = (8+1)/2 = 4.5

    Median = 50th percentile = (1.5 + 1.7)/2 = 1.6 mm

    order for Q1 = 25th percentile = (8+1)/4 = 2.25

    then Q1 = 0.7 + (1.2 - 0.7)/4 = 0.825 mm

    0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

    mean

    median

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    27/106

    0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

    mean

    median

    Median is often used with mean.

    Mean is, however, used much more frequent.

    Median is a better measure of central tendency for data

    with skewed distribution or outliers.

    0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 mm

    mean

    median

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    28/106

    Other Measures of Central Tendency

    Range midpoint or range = (Max value - Min value)/2

    not a good estimate of the mean and seldom-used

    Geometric mean = n(x1x2x3x4.xn)= 10^[mean of log10(xi)]

    Only for positive ratio scaledata

    If data are not all equal, geometric mean < arithmetic mean

    Use in averaging ratios where it is desired to give each ratio

    equal weight

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    29/106

    Measurements of Dispersion

    Rangee.g. length of 8 fish larvae at day 3 after hatching:

    0.6, 0.7, 1.2, 1.5, 1.7, 2.0, 2.2, 2.5 mm

    Range = 2.5 - 0.6 = 1.9 mm (or say from 0.6 t 2.5mm)

    Percent i le and quart iles

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    30/106

    Population Standard Deviation (

    )

    Averaged measurement of deviation from mean

    xi- x

    e.g. five rainfall measurements, whose mean is 7

    Rainfall (mm) xi- x (xi- x)2

    12 12 - 7 = 5 25

    0 0 - 7 = -7 49

    2 2 - 7 = -5 25

    5 5 - 7 = -2 4

    16 16 - 7 = 9 81

    Sum = 184 Sum = 184 Population variance: 2= (xi- x)2/n= 184/5 = 36.8 Population SD: = (xi- x)2/n= 6.1

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    31/106

    Sample SD (s)

    s= [(xi- x)2]/ (n- 1)

    s= [xi2((xi)2 /n)]/ (n- 1)

    Two modifications:

    by dividing [(xi- x)2] by (n-1) rather than n, gives a betterunbiased estimate of (however, when nincreases,difference between sand declines rapidly)

    the sum of squared (SS) deviations can be calculated as

    (xi2)- (xi)2/ n

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    32/106

    Sample SD (s)

    e.g. five rainfall measurements, whose mean is 7.0

    Rainfall (mm) xi2 xi

    12 144 12

    0 0 0

    2 4 2

    5 25 516 256 16

    (xi2) = 429 xi= 35(xi)2= 1225

    s2= [xi2- (xi)2 /n]/ (n - 1) = [429 - (1225/5)]/ (5 - 1)= 46.0 mm

    s = (46.0) = 6.782 = 6.8 mm

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    33/106

    Basic Experimental Design for

    Environmental Research

    1. Setting environmental questions into statisticalquestions [e.g. spatial and temporal variations]

    2. Setting hypotheses and then statistical null hypotheses

    4. Statistical consideration (treatment groups, sample size,

    true replication, confounding factors etc.)5. Sampling design (independent, random, samples)

    6. Data collection & measurement (Quality Control andQuality Assurance Procedures)

    7. Data analysis

    Too few data: cannot obtain reliable conclusions

    Too many data: extra effort (time and money) indata collection

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    34/106

    Generalized scheme of logical components of a

    research programme (Underwood 1997)

    Weapon size versus body

    size as a predictor ofwinning fights

    Carcinus maenas

    Reference: Sneddon et

    al. 1997, in

    Behav. Ecol . Sociobiol.

    41: 237 - 242

    Reject Ho

    Supporthypothesisand model

    Retain HoRefutehypothesisand model

    Interpretation

    Don't end here

    Experiment

    Critical test of null hypothesis

    Null Hypothesis

    Logical opposite to hypothesis

    HypothesisPredictions based on model

    Models

    Explanations or theories

    Start hereObservations

    Patterns in space or time

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    35/106

    Randomized Sampling

    Lucky Draw Concept

    To randomly select 30 out of 200 sampling stations inHong Kong waters, you may perform a lucky draw.

    So, the chance for selecting each one of them for eachtime of drawing would be more or less equal (unbiased).It can be done with or without replacement.

    Sampling with Transects and a Random Number Table

    Randomly lay down the transects based on random nos.

    Randomly take samples along each transect.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    36/106

    Randomized Sampling

    Spatial ComparisonClustered RandomSampling

    S1 S2 S3 S4 S5 S6 S7

    S8 S9 S10 S11 S12 S13 S14

    S15 S16 S17 S18 S19 S20 S21

    Temporal Comparison

    Wet Season vs. Dry Season

    Randomly select sampling days within eachseason (assuming each day is independent toother days) covering both neap and spring tides.

    Transitional period should not be selected toensure independency of the two seasons.

    Randomly take

    e.g. 10 samples

    from each

    randomlyselected site

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    37/106

    Study Sites (HK map)

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    38/106

    Spatial

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    39/106

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    40/106

    Stratified (Random) Sampling

    The population is first divided into a number of parts or'strata' according to some characteristic, chosen to be

    related to the major variables being studied.

    Water samples from three different water depths (1 m from the

    surface, mid-depth, 1 m above seabed).

    Water samples from a point source of pollution using a transect

    (set away from the source to open sea) with fixed sampling

    intervals (e.g. 1, 5, 10, 20, 50, 100, 500, 1000, 2000 m).

    Sediment samples from the high (2 m of Chart Datum), mid (1 mCD) and low intertidal zones (0.5 m CD).

    Sediment and water samples from different beneficial uses in

    Hong Kong waters.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    41/106

    Precision and Accuracy

    Neitherprecise

    nor

    accurate

    Highly

    precise

    but not

    accurate

    Moderately

    precise

    and

    accurate

    Highly

    preciseand

    accurate

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    42/106

    Quality Control & Quality Assurance

    e.g. Total phosphate measurements for a water sample

    Step 1: Pipette 1 ml

    sample to a cuvette

    Step 2: Pipette 0.5 ml

    colour reagent

    Step 3: Reaction for

    15 minutes

    Conc.

    Abs

    Accuracy can be

    checked with certified

    standard reference

    solutions.

    Precision can be

    estimated using

    procedure replicates.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    43/106

    Lead 0.065 0.007

    QC & QA:

    Control Chart

    The measured mean value can be

    compared with the certified mean

    value using one-sample t-test.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    44/106

    Why is it so important to use the mean of the

    means in the experimental design?

    Central Limit Theorem

    The mean will remain the same if a

    mean of the means is used instead of

    taking a simple mean but the SD of the

    means will be substantially smallerthan the original sample SD.

    For each water body, 50 samples are

    taken. It is advantageous if they are

    grouped into 5 groups of 10 samples tocompute the mean of the means. This

    will increase the power for subsequent

    comparison with other sites.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    45/106

    True Replication vs. Pseudo- Replication

    Treatment BControl Treatment A

    Will it be correct to say that there are four replicatesper group? If not, why?

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    46/106

    True Replication vs. Pseudo- Replication

    Treatment BControl Treatment A

    Will it be correct to say that there are three replicates pergroup? If yes, why?

    Mean 1 Mean 2 Mean 3

    With the same replication

    arrangement as those in

    the Control.

    Strom drainageWave breaker

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    47/106

    How can we obtain a statistically soundfigure of E. co licount for this bathing

    beach?

    Strom drainage

    outfallA Bathing BeachWave breaker

    Sea

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    48/106

    True Replication vs. Pseudo- Replication

    Site CSite A Site B

    Five replicates per group and each replicate with threeprocedure replicates to ascertain the measurement precision.

    With the same replication

    arrangement as those in

    the Site A.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    49/106

    True Replication vs. Pseudo- Replication

    Site A

    Three replicated sites per site, each replicated site with threereplicate samples and each sample with three procedure

    replicates to ascertain the measurement precision.

    3 Sub-sites

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    50/106

    Inferential Statistics

    Frequency Distribution

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    51/106

    e.g. The particle sizes (m) of 37 grainsfrom a sample of sediment from an

    estuary

    8.2 6.3 6.8 6.4 8.1 6.3

    5.3 7.0 6.8 7.2 7.2 7.1

    5.2 5.3 5.4 6.3 5.5 6.0

    5.5 5.1 4.5 4.2 4.3 5.14.3 5.8 4.3 5.7 4.4 4.1

    4.2 4.8 3.8 3.8 4.1 4.0 4.0

    Defineconvenient

    classes (equal

    width) and

    class intervalse.g. 1 m

    Sediment grain sizes

    Frequency Distribution

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    52/106

    e.g.A frequency distribution table for the size of particles

    collected from the estuary

    Particle size (m) Frequency3.0 to under 4.0 2

    4.0 to under 5.0 125.0 to under 6.0 10

    6.0 to under 7.0 7

    7.0 to under 8.0 48.0 to under 9.0 2

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    53/106

    e.g. A frequency distribution for the size of particles

    collected from the estuary

    Frequency Histogram

    3 to

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    54/106

    >149-153>153-157>157-161>161-165>165-169>169-173>173-177>177-181>181-1850

    2

    4

    6

    8

    10

    12

    14

    Height (cm)

    Frequen

    cy

    e.g.A frequency distribution of height of the 30

    years old people (n = 52: 30 females & 22 males)

    Why bimodal-like ?

    Th N l C

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    55/106

    0.00

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.10

    140 150 160 170 180 190

    Height (cm)

    Probability

    dens

    ity

    Parameters and determinethe position of the curve on the

    x-axis and its shape.

    Normal curve was firstexpressed on paper (for

    astronomy) by A. de Moivre in

    1733.

    Until 1950s, it was then appliedto environmental problems.

    (P.S. non-parametric statistics

    were developed in the 20th

    century)

    female

    male

    The Normal Curve f(x) = [1/(2)]exp[(x )2/(22)]

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    56/106

    The Standard Normal Curve with a Mean = 0

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    57/106

    = 0, = 1 and with the total area under the curve = 1 units along x-axis are measured in units Figures: (a) for 1 , area = 0.6826 (68.26%); (b) for 2

    95.44%; (c) the shaded area = 100% - 95.44%

    The Standard Normal Curve with a Mean = 0

    (Pentecost 1999)

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    58/106

    Alternatively, we can state the null hypothesisas that a random observation ofZwill lie

    outside the limit -1.96 or +1.96.

    There are 2 possibilities:

    Either we have chosen an unlikely value

    ofZ, or our hypothesis is incorrect.

    Conventionally, when performing a

    significant test, we make the rule that if

    Z values lies outside the range 1.96, then the null hypothesis is rejected andtheZvalue is termedsignificant at the 5% level or = 0.05 (or p < 0.05) -critical value of the statistics.

    ForZ= 2.58, the value is termed significant at the 1% level.

    Inferential statistics - testing the null hypothesis

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    59/106

    Accept Ho Reject Ho

    Accept Ho Reject Ho

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    60/106

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    61/106

    Statistical Errors in Hypothesis Testing

    Consider court judgements where the accused is

    presumed innocent until proved guilty beyond

    reasonable doubt (I.e. Ho = innocent).

    If the accused is

    truly innocent(Ho is true)

    If the accused is

    truly guilty(Ho is false)

    Courts

    decision:Guilty

    Wrongjudgement

    OK

    Courts

    decision:Innocent

    OK Wrongjudgement

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    62/106

    Statistical Errors in Hypothesis Testing

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    63/106

    Statistical Errors in Hypothesis Testing

    For example, Ho: The average ammonia concentrations are similar

    between the suspected polluted Site A and the reference cleanSite B, i.e. A = B

    If Ho is indeed a true statementabout a statistical population,

    it will be concluded (erroneously) to be false 5% of time (in case

    = 0.05).

    Rejection of Ho when it is in fact true is a Typ e I erro r (also

    called an error).

    If Ho is indeed false, our test may occasionally not detect thisfact, and we accept the Ho.

    Acceptance of Ho when it is in fact false is a Type II erro r

    (also called a error).

    Minimization of Type II error is vitally essential for environmental management.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    64/106

    Power of a Statistical Test

    Power is defined as 1-. is the probability to have Type II error.

    Power (1- ) is the probability of rejectingthe null hypothesis when it is in fact false

    and should be rejected.

    Probability of Type I error is specified as . But is a value that we neither specify nor

    known.

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    65/106

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    66/106

    What is next?

    1. Group Discussion on the Experimental

    Design for a Case Study

    2. Introduction to Two Classes of Basic

    Statistical Techniques:

    (1) correlation based methods and

    (2) group comparison methods

    3. Power Analysis

    Measurements

    (data)

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    67/106

    Normality Check

    Frequency histogram

    (Skewness & Kurtosis)

    Probability plot, K-S

    test

    Descriptive

    statistics

    (data)

    Mean, SD, SEM,

    95% confidence

    interval

    YES

    Check the

    Homogeneity

    of Variance

    Data transformation

    NO

    Data transformation

    NO

    Median, range,

    Q1 and Q3

    Non-Parametric

    Test(s)

    For 2 samples: Mann-

    Whitney

    For 2-paired samples:

    Wilcoxon

    For >2 samples:Kruskal-Wallis

    Sheirer-Ray-Hare

    Parametric Tests

    Students t tests for

    2 samples; ANOVA

    for 2 samples; posthoc tests for

    multiple comparison

    of means

    YES

    Ball

    alls Flowchart

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    68/106

    A Comparing Two Samples

    Independent Samples t test

    B Comparing More than 2 Samples

    Analysis of Variance (ANOVA)

    Power Analysis with G*Power

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    69/106

    G*Power 3Free Software

    http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    70/106

    Photo source: http://www-groups.dcs.st-and.ac.uk/~history/PictDisplay/Gosset.html

    Mr. Student = Mr. William Sealey Gosset (18761937)

    Measurements

    (data)

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    71/106

    Normality Check

    Frequency histogram

    (Skewness & Kurtosis)

    Probability plot, K-S

    test

    Descriptive

    statistics

    (data)

    Mean, SD, SEM,

    95% confidence

    interval

    YES

    Check the

    Homogeneity

    of Variance

    Data transformation

    NO

    Data transformation

    NO

    Median, range,

    Q1 and Q3

    Non-Parametric

    Test(s)

    For 2 samples: Mann-

    Whitney

    For 2-paired samples:

    Wilcoxon

    For >2 samples:Kruskal-Wallis

    Sheirer-Ray-Hare

    Parametric Tests

    Students t tests for

    2 samples; ANOVA

    for 2 samples; posthoc tests for

    multiple comparison

    of means

    YES

    Ball

    alls Flowchart

    The difference between two sample

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    72/106

    The difference between two sample

    means with limited data

    If n

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    73/106

    A B

    Meanmeasuredunit

    Comparison of 2 Independent Samples

    C i f 2 I d d t S l

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    74/106

    Error bar = 95% C.I.

    A B

    Measuredunit

    Comparison of 2 Independent Samples

    C i f 2 I d d t S l

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    75/106

    A B

    Measuredunit

    Comparison of 2 Independent Samples

    Error bar = 95% C.I.

    Comparison of 2 Independent Samples

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    76/106

    A B

    Measured

    unit

    Comparison of 2 Independent Samples

    Error bar = 95% C.I.

    Power and sample size for Students t test

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    77/106

    Power and sample size for Student s ttest

    We can estimate the minimum sample size to use to

    achieve desired test characteristics: n (2SP

    2/2)(t,+ t(1),)

    2

    where is the smallest population difference we wishto detect: = 1- 2

    Required sample size depends on , populationvariance(2), , and power(1-)

    If we want to detect a very small , we need a largersample.

    If the variability within samples is great, a large nisrequired. The results of pilot study or pervious studyof this type would provide such an information.

    E ti ti f i i d t t bl diff

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    78/106

    Estimation of minimum detectable difference

    n (2SP2/2)(t,+ t(1),)2

    The above equation can be rearranged to

    ask how small a population difference ()is detectable with a given sample size:

    [(2SP2/n)](t,+ t(1),)

    S N t b t Eff t Si

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    79/106

    If aliens were to land on earth, howlong would it take for them torealise that, on average, humanmales are taller than females?

    The answer relates to the effectsize (ES) of the difference in heightbetween men and women.

    The larger the ES, the quicker theywould suspect that men are taller.

    Cohen (1992) suggested where 0.2 isindicative of a small ES, 0.5 amedium ES and 0.8 a large ES.

    Some Notes about Effect Size

    http://spss.wikia.com/wiki/Sample_Size,_Effect_Size,_and_Power

    ww

    w.myspace.com

    /mtkchronicles

    A Students tTest with na= nb

    Example 1

    http://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/mtkchronicleshttp://www.myspace.com/jimmywenger
  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    80/106

    e.g. The chemical oxygen demand (COD) is measured at two

    industrial effluent outfalls, a and b, as part of consent procedure.

    Test the null hypothesis: Ho: a= bwhile HA: ab

    a b

    3.48 3.89

    2.99 3.19

    3.32 2.80

    4.17 4.31

    3.78 3.42

    4.00 3.41

    3.20 3.55

    4.40 2.40

    3.85 2.99

    4.52 3.08

    3.09 3.31

    3.62 4.52

    n 12 12

    mean 3.701 3.406

    S2 0.257 0.366

    sp2= (SS1+ SS2) / (1+ 2)= [(0.25711) + (0.366 11)]/(11+11)

    = 0.312

    sX1X2= (sp2/n1+ sp

    2/n2)= (0.312/12)2 = 0.228

    t = (X1X2) /sX1X2=(3.7013.406) / 0.228 = 1.294

    df = 2n- 2 = 22

    t= 0.05, df = 22, 2-tailed= 2.074 > t observed= 1.294,p> 0.05

    The calculated t-value < the critical tvalue.

    Thus, accept Ho.

    Need to check Power.

    SS = sum of square = S2

    Remember to always check the homogeneity of variance before running the ttest.

    Example 1

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    81/106

    Example 1

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    82/106

    N = 2 x 48 = 96

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    83/106

    Ho: transgenicnon-transgenic Example 2

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    84/106

    HA: transgenic> non-transgenicGiven that mass (g) of tilapia are normally distributed.

    n 8 8

    mean 625.0 306.25

    S2 6028.6 5798.2

    transgenic non-transgenic700 305

    680 280

    500 275

    510 250

    670 490

    670 275

    620 275650 300

    sp2= (SS1+ SS2) / (1+ 2)= 5913.4

    sX1X2= (sp2/n1+ sp

    2/n2)= 38.45

    t = (X1X2) /sX1X2=8.29

    df = 2n- 2 = 14

    t= 0.05, df = 14, 1-tailed= 1.761

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    85/106

    Example 2

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    86/106

    N = 2 x 3 = 6

    Comparison of [PBDEs] in tissues oft l t d l ll t d f 6 it

    Example 3

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    87/106

    transplanted mussels collected from 6 sitesalong a anticipated pollution gradient

    Expected that high[PBDEs] in samples from

    polluted sites than clean

    sites

    Ha: unequal means Ho: equal means

    [PBDEs] in mussels from various sites

    (ng/g)

    P1 P2 P3 P4 C5 C6

    4.25 3.50 7.20 4.00 0.50 2.50

    3.45 3.80 6.50 5.50 5.50 2.50

    4.75 4.70 4.00 2.20 2.25 2.30

    5.60 1.01 2.20 1.70 3.00 3.30

    3.20 6.00 3.50 6.00 5.00 4.50

    Example 3

    Comparison of [PBDEs] in tissues of transplanted mussels collected

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    88/106

    ANOVA

    Source of Variation SS df MS F P-value

    Between Groups 9.465417 5 1.893083 0.650981 0.663547

    Within Groups 69.79308 24 2.908045 common SD 1.705299

    Total 79.2585 29

    Comparison of [PBDEs] in tissues of transplanted mussels collectedfrom 6 sites along a anticipated pollution gradient

    0.00

    1.00

    2.00

    3.00

    4.00

    5.00

    6.00

    7.00

    8.00

    1 2 3 4 5 6

    Sites

    [PBDEs]inmus

    sels(ng/g

    = 2.908

    Example 3

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    89/106

    Example 3

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    90/106

    N = 6 x 21 = 126

    2-Way ANOVA: Effects of dietary PCBsand sex on heart rate in birds

    Example 4

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    91/106

    and sex on heart rate in birds

    There was a significant effectof chemical treatment on theheart rate in the birds (P 0.05

    PCB x Sex 4.900 1 4.90 0.21 4.49 > 0.05

    Within cells (error) 366.4 16 22.90

    0

    5

    10

    15

    20

    25

    30

    35

    40

    Heartrate(beat/min)

    Female

    Male

    Control PCB treated

    Example 4

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    92/106

    Source of variance SS DF MS = SS/DF F F critical, 0.05(1), 1, 16 P

    Total 1827.7 19

    Cells 1461.3 3

    PCB 1386.1 1 1386.10 60.53 4.49 < 0.001

    Sex 70.31 1 70.31 3.07 4.49 > 0.05 PCB x Sex 4.900 1 4.90 0.21 4.49 > 0.05

    Within cells (error) 366.4 16 22.90

    For the sex effect

    Variance for sex = 70.31

    Error variance = 22.90

    N should be 2 x 2 x 7 = 28

    Measurements

    (data)

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    93/106

    Normality Check

    Frequency histogram

    (Skewness & Kurtosis)

    Probability plot, K-S

    test

    Descriptive

    statistics

    Mean, SD, SEM,

    95% confidenceinterval

    YES

    Check the

    Homogeneity

    of Variance

    Data transformation

    NO

    Data transformation

    NO

    Median, range,

    Q1 and Q3

    Non-Parametric

    Test(s)

    For 2 samples: Mann-

    Whitney

    For 2-paired samples:

    Wilcoxon

    For >2 samples:Kruskal-Wallis

    Sheirer-Ray-Hare

    Parametric Tests

    Students t tests for

    2 samples; ANOVA

    for 2 samples; posthoc tests for

    multiple comparison

    of means

    YES

    Alternatives to Hypothesis

    testing exist

    Due to

    shortcomings of

    inferential stats

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    94/106

    There are problems in the conventional hypothesis testing:

    http://www.youtube.com/watch?v=ez4DgdurRPg

    http://www.youtube.com/watch?v=ez4DgdurRPghttp://www.youtube.com/watch?v=ez4DgdurRPg
  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    95/106

    YouTube - Bayes' Formula

    http://www.youtube.com/watch?v=pPTLK5hF

    GnQ&feature=channel

    By bionicturtledotcom

    http://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channelhttp://www.youtube.com/watch?v=pPTLK5hFGnQ&feature=channel
  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    96/106

    YouTube - Bayes' Theorem -

    Part 2http://www.youtube.com/watch?v=bcA

    LcVmLva8&feature=related

    By westofvideo

    A Simple Example

    http://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=relatedhttp://www.youtube.com/watch?v=bcALcVmLva8&feature=related
  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    97/106

    8 Sick 2 Fine 95 Sick 895 Fine

    10 Exposed

    1000 People

    990 Non-Exposed

    What is the chance to be sick after eating scallops (i.e. exposed)?

    Probability = 8 exposed with illness/(total of 103 with illness)

    = 0.078

    1000 P l (P 1)

    A Probability DiagramBayesian Approach

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    98/106

    P=0.800

    Sick

    P=0.200

    Fine

    P=0.096

    Sick

    P=0.904

    Fine

    P=0.010 Exposed

    1000 People (P=1)

    P=0.990 Non-Exposed

    What is the chance to be sick after eating scallops (i.e. exposed)?

    P(Exposed

    Sick) =

    P(Exposed) P(Sick

    Exposed)

    P(Sick)

    =(0.010)(0.800)

    (0.010*0.800+0.990*0.096)= 0.078

    Example: Fishkills

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    99/106

    This figure illustrates how

    the natural frequency

    approach can lead tothese same inferences

    using the p(Pfiesteria)

    estimate of 0.205. From

    the figure, the likelihood

    ratio can be calculated.

    Mike Newman, et al. 2007.

    Coastal and estuarine

    ecological risk

    assessment: the need fora more formal approach

    to stressor identification.

    Hydrobiologia 577: 31-40.

    LargeFish Kill

    HighPfiesteria

    HighPfiesteria

    LargeFish Kill

    LowOxygen

    LowOxygen

    Yes0.081

    (810)

    No

    0.919(9190)

    Yes0.520

    No0.480

    Yes0.205

    No0.795

    421 Casesof Kills with Pfiesteria

    389 Cases

    of Kills without Pfiesteria

    1884 Cases of no Killswith Pfiesteria

    7306 Cases

    of no Killswithout Pfiesteria

    Yes0.081(810)

    No0.919

    (9190)

    Yes0.220 No

    0.780Yes0.095

    No0.905

    178 Casesof Kills with Low DO

    632 casesof Kills without Low DO

    873 Casesof no Kills

    with Low DO

    8317 Casesof no Kills without Low DO

    Example: Fishkills

    Credit: M.C. Newman

    22346.0arg1884

    arg421

    levelsPfiesteriahighwithkillsfishelnoofCases

    levelsPfiesteriahighwithkillsfishelofCases

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    100/106

    096.120389.0

    22346.0

    20389.0arg873

    arg178

    g

    RatioLikelihood

    ionsconcentratoxygendissolvedlowwithkillsfishelnoofCases

    ionsconcentratoxygendissolvedlowwithkillsfishelofCases

    fgff

    095.1)|(

    )|(

    DOLowKillFishp

    PfiesteriaKillFishp

    LargeFish Kill

    High

    Pfiesteria

    High

    Pfiesteria

    LargeFish Kill

    LowOxygen

    LowOxygen

    Yes0.081(810)

    No0.919(9190)

    Yes0.520

    No0.480

    Yes0.205

    No0.795

    421 Casesof Kills with Pfiesteria

    389 Casesof Kills without Pfiesteria

    1884 Cases of no Killswith Pfiesteria

    7306 Cases of no Killswithout Pfiesteria

    Yes0.081(810)

    No0.919(9190)

    Yes0.220 No

    0.780Yes0.095

    No0.905

    178 Casesof Kills with Low DO

    632 cases

    of Kills without Low DO

    873 Cases

    of no Killswith Low DO

    8317 Casesof no Kills without Low DO

    Credit: M.C. Newman

    Urbanization

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    101/106

    Fish ageFish sex

    Fish liver

    lesions

    Sediment

    concentrations-inorganics

    Fish

    mortality

    Sediment

    concentrations-organochlorines

    (DDTs, chlordane)

    Sediment

    concentrations-PAHs

    Stomach

    concentrations-

    Inorganics

    Stomach

    concentrations-

    PAHs

    Stomach

    concentrations-

    organochlorines

    Fish liver tissue

    concentrations-

    inorganics

    Fish liver tissue

    concentrations-

    PAHs

    Fish liver tissue

    concentrations-

    organochlorines

    English sole (Pleuronectes vetulus) from Puget Sound

    Marine Environmental Research 45: 47-67 (1998).

    Credit: M.C. Newman

    Software Exists for More Complex Situations

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    102/106

    Credit: M.C. Newman

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    103/106

    Credit: M.C. Newman

    Supplemental Readings

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    104/106

    pp gAven, T. & J.T. Kvaly, 2002. Implementing the Bayesian paradigm in risk analysis.

    Reliability Engineering and System Safety78: 195-201.

    Bacon, P.J., J.D. Cain & D.C. Howard, 2002. Belief network models of land manager

    decisions and land use change. Journal of Environmental Management65: 1-23.Belousek, D.W., 2004. Scientific consensus and public policy: the case of Pfiesteria.

    Journal Philosophy, Science & Law4: 1-33.

    Borsuk, M.E., 2004. Predictive assessment of fish health and fish kills in the Neuse

    River estuary using elicited expert judgment, Human and Ecological Assessment

    10: 415-434.

    Borsuk, M.E., D. Higdon, C.A. Stow & K.H. Reckhow, 2001. A Bayesian hierarchical

    model to predict benthic oxygen demand from organic matter loading in estuaries

    and coastal zones. Ecological Modelling143: 165-181.

    Garbolino, P. and F. Taroni. 2002. Evaluation of scientific evidence using Bayesian

    networks. Forensic Sci Intern.125:149-155.

    Newman, M.C. and D. Evans. 2002. Causal inference in risk assessments: Cognitive

    idols or Bayesian theory? In:Coastal and Estuarine Risk Assessment. CRC Press

    LLC, Boca Raton, FL, pp. 73-96.Newman, M.C., Zhao, Y., and J.F. Carriger. 2007. Coastal and estuarine ecological

    risk assessment: the need for a more formal approach to stressor identification.

    Hydrobiologia577: 31-40.

    Uusitalo, L. 2007. Advantages and challenges of Bayesian networks in environmental

    modeling. Ecol. Modelling203:312-318.

    Credit: M.C. Newman

  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    105/106

    YouTube - Bayes' Theorem

    Introductionhttp://www.youtube.com/watch?v=0NG

    mrwu_BkY&feature=related

    By westofvideo

    http://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=relatedhttp://www.youtube.com/watch?v=0NGmrwu_BkY&feature=related
  • 8/11/2019 AoE RPG Workshop - Basic Statistics for Research

    106/106