Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.
-
Upload
rachel-mcdonald -
Category
Documents
-
view
217 -
download
2
Transcript of Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.
![Page 1: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/1.jpg)
Day 2 Session 1
Basic Statistics
Cathy Mulhall
South East Public Health Observatory
Spring 2009
![Page 2: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/2.jpg)
Overview
• Types of data
• Summarising data
• The Normal distribution
• Confidence intervals
• Hypothesis testing
• P-values
![Page 3: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/3.jpg)
Types of Data
Numerical(Quantitative)
Counted or measured
Discrete Continuous
Categorical(Qualitative)
Characterises a quality
Nominal Ordered
![Page 4: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/4.jpg)
Numerical data
Discrete
Integers (whole numbers)
Examples• Number of people• Number of teeth
Continuous
Any value on a scale
Examples• Height• Weight
![Page 5: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/5.jpg)
Categorical data
Nominal
No natural order
Examples• Gender• Ethnic group
Ordered
Have a natural order
Examples• Socio-economic group• Cancer Staging (I – IV)
![Page 6: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/6.jpg)
Which types of data are the following?
• Screening test result• Parity (no. of children) • Pain scale• Age at last birthday• Exact age• Alive at 6 months? • Number of bed days in
hospital
– categorical nominal– numerical discrete– categorical ordered– numerical discrete– numerical continuous– categorical nominal– numerical discrete
![Page 7: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/7.jpg)
Summarising numerical data
1. Location (central tendency)
Mean, median, mode
2. Spread (variation)
Range, percentiles, standard deviation
![Page 8: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/8.jpg)
Location
• Mean – sum of all obs / number of obs
• Median – value that divides the dist in 2,odd no. of obs - middle obs even no. of obs - mean of central pair
• Mode – value that occurs most frequently
![Page 9: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/9.jpg)
Mean and median?
a) 3, 4, 5, 6, 7
b) 9,10, 20, 21
c) 1, 2, 3, 4, 990
Mean
25/5
60/4
1000/5
Median
5
15
3
![Page 10: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/10.jpg)
Variation
• Range = highest value – lowest value
• Interquartile range = upper quartile – lower quartile (i.e. 3rd quartile – 1st quartile)
• Percentile – value below which a given proportion of the data lies
![Page 11: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/11.jpg)
Variance
• Step 1: Calculate ‘Deviations’ = the difference between each observation and the mean of the data
• Step 2: Square these Deviations
• Step 3: Average the Squared Deviations
• … this is the Variance
• (Strictly, divide by n-1, not n)
![Page 12: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/12.jpg)
Standard Deviation
• Step 4: Take the square root of the Variance (this returns the statistic to the same units as the data)
… this is the Standard Deviation
SD measures the amount of variability in the population
![Page 13: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/13.jpg)
Summarising categorical data
• Percentages and rates
• Covered in Day 3 – Introduction to Analysis session
![Page 14: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/14.jpg)
Normal distribution
• Symmetric
• Bell shaped
• Standard Normal Distribution
Mean = 0 SD = 1
• Represents the distribution of values observed if whole population was studied
![Page 15: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/15.jpg)
Normal distribution
Mean, Median, Mode
![Page 16: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/16.jpg)
Normal Distribution, changes in mean
Normal Distribution
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-3
-2.7
-2.5
-2.2
-1.9
-1.6
-1.4
-1.1
-0.8
-0.5
-0.3 0
0.2
7
0.5
5
0.8
2
1.1
1.3
7
1.6
5
1.9
2
2.2
2.4
7
2.7
5
3.0
2
3.3
3.5
7
3.8
5
4.1
2
4.4
Pro
bab
ilit
y d
en
sit
y
μ=0.7μ=0 μ=2
σ=1
![Page 17: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/17.jpg)
Normal Distribution, changes in SD
Normal Distribution
0
0.2
0.4
0.6
0.8
1
1.2
1.4
-3
-2.8
-2.6
-2.3
-2.1
-1.9
-1.7
-1.4
-1.2 -1
-0.8
-0.5
-0.3
-0.1
0.1
2
0.3
5
0.5
7
0.8
1.0
2
1.2
5
1.4
7
1.7
1.9
2
2.1
5
2.3
7
2.6
2.8
2
Pro
bab
ilit
y d
en
sit
y
σ=1
σ=0.6
σ=0.3
μ=0
![Page 18: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/18.jpg)
Normal distribution
• Defined by complex math formulae
• Published tables listing the area under the Standard Normal Curve
• Standard N scores – Z scores
• Used to calculate area between 2 points
• 95% of dist lies within +/- 2 SD of mean
• Known as ‘reference range’
![Page 19: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/19.jpg)
Normal distribution
![Page 20: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/20.jpg)
Importance of N distribution
• Many biological variables are N dist or can be made N dist by transformation
• Many statistical tests require data to be N distributed
• If data skewed need to transform
• 1/X, Log (X), sqrt (X)
![Page 21: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/21.jpg)
Symmetric and Skewed Data
![Page 22: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/22.jpg)
Population Samples
Bias
• Deviation from true result
• Minimised by random sampling
Random Error
• In any random sample there will be sampling variation
• Minimised by random sampling
![Page 23: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/23.jpg)
Sampling Variability
Sampling Variability
Hypothesis Tests Confidence Intervals
![Page 24: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/24.jpg)
Standard Error
• Standard deviation measures the amount of variability in the sample estimate
• It indicates how closely the population mean or proportion is likely to be to the sample estimate
![Page 25: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/25.jpg)
Standard Error
n
SDxSE ,xMean,
n
pppSE
1pProportion,
![Page 26: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/26.jpg)
Confidence Intervals
• Based on the Normal distribution, 95% sample estimates will be within 1.96 SEs from the true value
• For 95% of samples this interval will contain the true population value
• For any one sample there is a 95% chance that the interval contains the true value
xSEx 96.1 pSEp 96.1
![Page 27: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/27.jpg)
Confidence Intervals
• 5% risk (or 1 in 20 chance) than true value lies outside the 95 % interval
• Tells us how imprecise our estimate is
• Provides a range of values within which the true (population) value is likely to lie
Narrow 95% CI precise estimate
Wide 95% CI imprecise estimate
![Page 28: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/28.jpg)
0
5
10
15
20
25
30
35
Self-reported smoking status in women (%), by ethnic group with 95% confidence intervals (England, 2004)
![Page 29: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/29.jpg)
• What can we say about the true smoking prevalence for the general population?
• For which ethnic groups is the prevalence of smoking significantly different from 25%?
• Is the prevalence of smoking significantly different between the Black Caribbean and Black African populations?
• Is the prevalence of smoking significantly different between the Pakistani and Bangladeshi populations?
Interpretation of confidence intervals
![Page 30: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/30.jpg)
• 95% confident that the true smoking prevalence for the general population is between 22.5 and 24.5%
• For Black African, Indian, Pakistani, Bangladeshi and Chinese the prevalence of smoking is significantly different from 25%
• The prevalence of smoking is significantly different between Black Caribbean and Black African groups
• Cannot be sure that the prevalence of smoking is significantly different between the Pakistani and Bangladeshi populations
Interpretation of confidence intervals
![Page 31: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/31.jpg)
Interpretation of confidence intervals
• Non overlapping intervals indicative of real differences
• Overlapping intervals need to be considered with caution
• Need to be careful about using confidence intervals as a means of testing.
• The smaller the sample size, the wider the confidence interval
![Page 32: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/32.jpg)
Hypothesis Tests
• Assess strength of evidence for an association
• Test statistic calculated using population value, sample estimate and stnd. error
• Null hypothesis; no true difference between groups in population from which samples arose
![Page 33: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/33.jpg)
Hypothesis Tests
• If the null hypothesis is true, what are the chances of getting as big (or bigger) as that observed
• Uses population value sample estimate and Standard Error
• Null hypothesis; no true difference between groups in population from which samples arose
![Page 34: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/34.jpg)
Illustration of acceptance regions
Principles of Testing
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-3
-2.8
-2.6
-2.3
-2.1
-1.9
-1.7
-1.5
-1.2 -1
-0.8
-0.6
-0.3
-0.1 0.1
0.32
0.55
0.77 1
1.22
1.45
1.67 1.9
2.1
2.32
2.55
2.77 3
Pro
bab
ilit
y d
ensi
ty
accept null hypothesis
reject null hypothesis
reject null hypothesis
μ0
![Page 35: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/35.jpg)
P-values
• probability of obtaining a difference as large (or larger) as that observed, if there is really no difference in the population from which the samples came, i.e. if the null hypothesis is true
![Page 36: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/36.jpg)
P-values
Small p-value (p<0.05)
• unlikely that the sample arose for a pop where null is true
• Evidence for a real difference in pop
Large p-value (p>0.05)
• likely that the sample arose for a pop where null is true
• No evidence to reject the null hypothesis
![Page 37: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/37.jpg)
Interpretation of P-values
Source; Essential medical statistics By Betty R. Kirkwood, Jonathan A. C. Sterne
![Page 38: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/38.jpg)
Quiz
A person was defined as hypertensive if their diastolic blood pressure was > 90 mmHg & their systolic was > 140 mmHg. The variable ‘hypertensive’ is:
a) Paired continuousb) Nominal categoricalc) Skewedd) Continuous
![Page 39: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/39.jpg)
What conclusion can be drawn from this figure?
a) The mean is less than the standard deviation
b) The mean is higher than the median
c) There are fewer observations below the mean than above it
d) The mean is approximately equal to the median
![Page 40: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/40.jpg)
Based on a sample of 153 newborns, the 95% CI for the pop mean birth weight was between 3181 and 3319 grams:
a) 95% of the individual birth weights are between 3181 & 3319 grams
b) The true mean for the 153 newborns is probably between 3181 & 3319 grams
c) The mean of the population from which the 153 newborns came is between 3181 & 3319 grams
d) There is a 95% chance that the true mean of the population from which the 153 newborns came is included in the range 3181 - 3319 grams
![Page 41: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/41.jpg)
Useful Resource
http://www.apho.org.uk/apho/techbrief.htm
![Page 42: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/42.jpg)
Finding out more
www.healthknowledge.org.uk
![Page 43: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/43.jpg)
Conclusions
• Cover some basic statistical concepts
• Gain insight into what they mean
• Gain confidence in understanding basic statistics
![Page 44: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/44.jpg)
Basic Statistics Exercise• Exercise 1 - Calculate some summary
statistics for class size data in spreadsheet
• Exercise 2 – using the CI template provided calculate the 95% CI for the mean class size from exercise 1
![Page 45: Day 2 Session 1 Basic Statistics Cathy Mulhall South East Public Health Observatory Spring 2009.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649ef05503460f94c00a30/html5/thumbnails/45.jpg)
Basic Statistics Exercise
• To download file go to http://www.sepho.org.uk and search on “intelligence training” then Day 2
or go tohttp://www.sepho.org.uk/viewResource.aspx?id
=12272
• Useful Excel FunctionsAVERAGE, MEDIAN, MODE, QUARTILE, PERCENTILE, VAR, STDEV