Harnessing the Power of Data - marphtc.pitt.edu · Source: Basic Concepts and Methodology for the...
-
Upload
vuongthien -
Category
Documents
-
view
214 -
download
0
Transcript of Harnessing the Power of Data - marphtc.pitt.edu · Source: Basic Concepts and Methodology for the...
Objectives for Module 4 – Inferential Statistics
• Explain the difference between type 1 error and type 2 error
• Describe the difference between parametric and nonparametric test statistics
• Write a null and alternative hypothesis
• Population
– Complete group of individuals or things that research aims to describe
• Parameters
– Numerical value that gives information about an entire population
• Sample
– Subset of a population
• Random Sample
– Subset of a population in which every characteristic present in the population has
an equal chance of being represented
• Statistics
– Numerical values that give information about a sample
Terminology (Review)
Population
Sample
Refresher:
Module 3
Descriptive Analysis
PercentageMeasures of
Central Tendency
Mean Median Mode
Measures of Spread
IQ Range
VarianceStandard Deviation
Review:
Two Broad Areas of Statistics
• Central tendency
• Variability
• Percentages
Inferential Statistics – Today’s topic
• Hypothesis testing
• Confidence intervals
• Model building/selection
Descriptive Statistics – Module 3
Inferential Statistics
Image Source: iStock
Terminology:
– Margin of Error
– Statistical Significance
• Discover property or general pattern about a large group
- by studying a smaller group of people
• Not possible to study the whole population so study a sample
- make prediction or statements related to findings
Why Would You Use Inferential Statistics?
• To compare groups
• To test hypotheses
• To make predictions
• To make a judgment whether a difference between groups
is dependable or might have happened due to chance
• To infer from the sample statistic what the population
parameter might be
Population
Sample
The Normal (Gaussian) Distribution
-Bell Curve-
Mean = Median = Mode
Source: Basic Concepts and Methodology for the Health Sciences
For a distribution that is perfectly normally distributed,
the mean is equal to the median, as well as the mode.
The Normal (Gaussian) Distribution
Standard DeviationsSource: Basic Concepts and Methodology for the Health Sciences
Often expressed in terms of standard deviation around the mean
• 68% of values within one standard deviation of the mean
• 95% of values within two standard deviations of mean
• 99.7% of values within three standard deviations of mean
Why is the Normal Distribution Important?
Source: Basic Concepts and Methodology for the Health Sciences
Most inferential statistics are based on assumption that the variable
we are measuring is normally distributed
• Measures in the whole population are normally distributed
• Our inferences are accurate
Normal Distribution
Mean; Median; Mode
Source: http://dx.doi.org/10.1136/emj.17.4.274
Variable
Fre
quency
A
B
Mode
Skewed to the Right
Median
Source: http://dx.doi.org/10.1136/emj.17.4.274
Variable
Mean
Fre
quency
C
Mode
Skewed to the Left
Median
Source: http://dx.doi.org/10.1136/emj.17.4.274
Variable
Mean
Fre
quency
Comparing Two or More Groups
Hypothesis Testing
Image Source: iStock
Much of statistics, especially in medicine and public health, is used to compare two or
more groups and attempting to figure out if the two groups are different from one another.
Using Inferential Statistics for Hypothesis Testing
• Make objective decisions about the outcome of their study
• Scientific hypothesis = what the researcher believes will be the
outcome of the study
• Null hypothesis = what can actually be tested by the statistical methods.
• Inferential statistics use the null hypothesis to test the validity of
a scientific hypothesis
Hypothesis Testing Example
• Scientific hypothesis:
Birthweight is different for babies of white mothers compared to black mothers
• Null hypothesis:
Birthweight for babies of white mothers =
Birthweight for babies of black mothers
Hypothesis Testing
General Framework
Image Source: iStock
• Specify null & alternative hypotheses
• Specify test statistic
• State rejection rule (RR)
• Compute test statistic and compare to RR
• State conclusion
Hypothesis Testing
Specifying Hypotheses
• H0: “null” or no effect hypothesis
• HA: research or alternative hypothesis
Note: Only H0 (null) is tested.
NULL Hypothesis
=BirthweightW = BirthweightB
ALTERNATIVE Hypothesis
=BirthweightW ≠ BirthweightB
Null Hypothesis
Opposite of the “question" the researcher wishes to answer
• There will be no difference among the groups of study subjects
• We are trying to disprove (“reject”) our null hypothesis
• "If these samples came from the same population with regard to the outcome,
how likely is the obtained result?”
• Any observed differences in the dependent variable (outcome) must be due
to sampling error (chance)
• The independent (predictor) variable does NOT make a difference
I am what is
the default, the status quo.
I am already accepted; I can only be rejected.
The burden of proof is on the alternative.
I am the null hypothesis.
0/
…..=…..Red
50%
EXAMPLE: Null Hypothesis
Same chances of landing
on red as on blackSame chances of landing
on black as on red
Black
50%
Image Source: Flickr
State the Alternative Hypothesis
• HA : treatment level means not all equal
• At least one mean is different from all others
• Does not say which is different (if there are multiple groups)
• Does not say the direction of the difference (which is higher or lower)
…posits a relationship between variables and therefore is not a null hypothesis
This is what the researcher is expected to prove (HA)
EXAMPLE: Alternative Hypothesis (HA)
“Children taught by individual instruction will exhibit less mastery of
mathematical concepts than those taught by group instruction”
Set the Alpha
• α = probability of Type I Error
• Set a priori (before study begins)
• Typical value α = 0.05, establishing a 95% confidence level
• If study is conducted 100 times, decision to reject the null hypothesis (and accept
the alternative hypothesis) would be wrong 5 times out of 100 due to chance
alone
• In our birth weight example,
we would make a Type I error if we INCORRECTLY reject the null hypothesis
that birthweights are the same and say that there is a difference in birthweights
among babies of black and white mothers
Errors in Statistical Inference
Image Source: Flickr
Type I - α
• Researcher rejects a null hypothesis
when it is actually true
• False positive
• Considered more serious error in hypothesis testing
Type II – β (beta)
• Researcher accepts a null hypothesis that is actually false
• We “fail to reject” the null hypothesis even though
the alternative hypothesis is correct
• False negative
• Often occurs when sample is too small
Errors in Statistical Inference – Type II
Source: Basic Concepts and Methodology for the Health Sciences
Image Source:?????
• Standardized value that is calculated from sample data during a hypothesis test
• Used to determine whether to reject the null hypothesis
• Compares your data with what is expected under the null hypothesis.
• Used to calculate the p-value
Test Statistics
Source: Basic Concepts and Methodology for the Health Sciences
Image Source:?????
• Which variables (types of measurement) will help answer research question?
• Which is the dependent (outcome) variable and what type of variable is it?
• Which are the independent (explanatory) variables, how many are there and
what data types are they?
• Are relationships or differences between means of interest?
• Are there repeated measurements of the same variable for each subject?
Questions to Consider When Selecting Test Statistics
Source: Basic Concepts and Methodology for the Health Sciences
Common Test Statistics
Hypothesis test Test statistic
Z-test Z-statistic
t-test t-statistic
ANOVA F-statistic
Chi-square test Chi-square statistic
Different hypothesis tests use different test statistics based on the probability model
assumed in the null hypothesis.
Common tests and their test statistics include:
Source: Basic Concepts and Methodology for the Health Sciences
Image Source:?????
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02Source: Basic Concepts and Methodology for the Health Sciences
Image Source:?????
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02
Source: Basic Concepts and Methodology for the Health Sciences
HO: no difference between the means
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02
Source: Basic Concepts and Methodology for the Health Sciences
Sample sizes are different
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02
Source: Basic Concepts and Methodology for the Health Sciences
Means look different
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02
Source: Basic Concepts and Methodology for the Health Sciences
T-test statistic
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02
Source: Basic Concepts and Methodology for the Health Sciences
P-value is <0.05
Birthweight Example
Difference between two means
(mean birthweight for babies of black mothers compared to white mothers)
Bwt_b Bwt_w
Mean 2719.7 3102.7
Variance 407917.1 529818.2
Observations 26 96
Hypothesized Mean Difference 0
df 44
t Stat -2.63
P(T<=t) two-tail 0.01
t Critical two-tail 2.02
Source: Basic Concepts and Methodology for the Health Sciences
We reject the null hypothesis
• Use the test statistic to find the p-value
o T test, F test, Z test, etc.
• Make a decision using the p-value
o P>0.05 = Fail to Reject the Null Hypothesis
o P<0.05 = Reject the Null Hypothesis
• Accept the Alternative (or research) hypothesis
Make a Decision and Interpret the Results
Image Source: iStock
Birthweight Example: Interpretation
• We reject the null hypothesis and accept the alternative hypothesis.
• There is a statistically significant difference (p<0.05) in birthweight among
babies born to black mothers and babies born to white mothers in our sample
• If we believe that our sample accurately represents the population,
we can generalize these results to the larger population
We reject the null hypothesis
Summary: Steps in Hypothesis Testing
• State null hypotheses
• State alternative (or research) hypotheses
• Select/set alpha
• Specify/compute the test statistic
• Make a decision and interpret the results
MakeSpecifySelectStateState
• Parametric
• Nonparametric
Specify/Compute the Test Statistic
Image Source: iStock
Determine the appropriate test statistic for your data
• Variable is normally distributed in the overall population
• Not based on the estimation of population parameters
o Requires measurement on at least an interval scale
o Involves certain assumptions about variables being studied
• More powerful and more flexible
Parametric Test Statistics
Parametric Test Statistics Uses
Source: http://www.statstutor.ac.uk/resources/uploaded/tutorsquickguidetostatistics.pdf
Nonparametric Test Statistics
• Most nonparametric tests about the population center are tests about median instead of mean
• The test does not answer the same question as the corresponding parametric procedureImage Source: iStock
Example
• Estimation of a population parameter
• Distribution is skewed (not normal)
• Variable measured on a nominal or ordinal scale
• Less powerful than corresponding tests
• Less likely to reject the null hypothesis when it is false
• Often require you to modify the hypotheses
Common Nonparametric Statistics
Image Source: iStock
• Chi-square- used when data is at the nominal level
o Determine difference between groups
• Fisher’s exact probability
o Robust and used with small samples
Parametric test
• 1-sample Z-test, 1-sample t-test
• 1-sample Z-test, 1-sample t-test
• 2-sample t-test
• One-way ANOVA
• One-way ANOVA
• Two-way ANOVA
Alternative Nonparametric test
• 1-sample sign test
• 1-sample Wilcoxon test
• Mann-Whitney test
• Kruskal-Wallis test
• Mood's Median test
• Friedman test
Image Source: iStock
Parametric Tests
& Nonparametric Alternatives
COMPARE:
Babies with Low Birthweights
• Proportion of babies born to women who smoked
during pregnancy
• Proportion of babies born to women who did not
smoke during pregnancy
Activity #1 Do smoking pregnant women increase the risk of low birthweight?
Image Source: iStock
• Null Hypothesis (Ho)
• Alternative Hypothesis (Ha)
Activity #1
Write the Null & Alternative Hypotheses
Image Source: iStock
• Researchers sampled 400 women who had smoked during their pregnancy
• They recorded the birth weight of the newborns
• Women who smoked had babies with lower birthweights than women in the
general population
• The p-value was 0.016.
What does this mean?
Activity #1
Assess the Evidence
Image Source: Unsplash
• P-value of the test is 0.016
• Very unlikely that we will observe these results if smoking does not increase
the risk of low birthweight (if H0 is true)
• Data provide enough evidence to reject the null hypothesis
• Proportion of low birthweight babies born to mothers who smoked during
their pregnancy is higher than overall proportion of low birthweight babies in
the population
Activity #1
Interpret the Results
Image Source: Unsplash
Module 1 – Data Sources
Module 2 – Types of Data
Module 3 – Descriptive Statistics
Module 4 – Inferential Statistics
Module 5 – Epidemiologic Concepts
Module 6 – Interpreting Data
Module 7 – Presenting Data
Module 8 – What Software to Use
Harnessing the Power of Data
You have just completed module 4 of the Data Analysis course. Please be sure
to complete all 9 modules, in order to receive Continuing Education Credits.
Module 9 – Summary with Q & A Session
Please contact me with your questions.
Jeanine Buchanich, PhD, MEdResearch Associate Professor – Biostatistics
Thanks for joining us!
This project is supported by the Health Resources and Services Administration (HRSA) of the U.S. Department of Health and Human Services (HHS) under grant number UB6HP27882 "Regional Public Health Training Center Program" for $3,420,000. This information or content and conclusions are those of the author and should not be construed as the official position or policy of, nor should any endorsements be inferred by HRSA, HHS, or the U.S. Government.
www.marphtc.pitt.edu