Harnessing the Power of Data - marphtc.pitt.edu · Source: Basic Concepts and Methodology for the...

Harnessing the Power of DataModule 4 – Inferential Statistics

www.marphtc.pitt.edu

Objectives for Module 4 – Inferential Statistics

• Explain the difference between type 1 error and type 2 error

• Describe the difference between parametric and nonparametric test statistics

• Write a null and alternative hypothesis

• Population

– Complete group of individuals or things that research aims to describe

• Parameters

– Numerical value that gives information about an entire population

• Sample

– Subset of a population

• Random Sample

– Subset of a population in which every characteristic present in the population has

an equal chance of being represented

• Statistics

– Numerical values that give information about a sample

Terminology (Review)

Relationship between Population & Sample

Sample Population

ParametersStatistics

Population

Sample

Refresher:

Module 3

Descriptive Analysis

PercentageMeasures of

Central Tendency

Mean Median Mode

Measures of Spread

IQ Range

VarianceStandard Deviation

Review:

Two Broad Areas of Statistics

• Central tendency

• Variability

• Percentages

Inferential Statistics – Today’s topic

• Hypothesis testing

• Confidence intervals

• Model building/selection

Descriptive Statistics – Module 3

Inferential Statistics

Image Source: iStock

Terminology:

– Margin of Error

– Statistical Significance

• Discover property or general pattern about a large group

- by studying a smaller group of people

• Not possible to study the whole population so study a sample

- make prediction or statements related to findings

Why Would You Use Inferential Statistics?

• To compare groups

• To test hypotheses

• To make predictions

• To make a judgment whether a difference between groups

is dependable or might have happened due to chance

• To infer from the sample statistic what the population

parameter might be

Population

Sample

The Normal (Gaussian) Distribution

-Bell Curve-

Mean = Median = Mode

Source: Basic Concepts and Methodology for the Health Sciences

For a distribution that is perfectly normally distributed,

the mean is equal to the median, as well as the mode.

The Normal (Gaussian) Distribution

Standard DeviationsSource: Basic Concepts and Methodology for the Health Sciences

Often expressed in terms of standard deviation around the mean

• 68% of values within one standard deviation of the mean

• 95% of values within two standard deviations of mean

• 99.7% of values within three standard deviations of mean

Why is the Normal Distribution Important?


Most inferential statistics are based on assumption that the variable

we are measuring is normally distributed

• Measures in the whole population are normally distributed

• Our inferences are accurate

Normal Distribution

Mean; Median; Mode

Source: http://dx.doi.org/10.1136/emj.17.4.274

Variable

Fre

quency

A

B

Mode

Skewed to the Right

Median


Variable

Mean

Fre

quency

C

Mode

Skewed to the Left

Median


Variable

Mean

Fre

quency

Comparing Two or More Groups

Hypothesis Testing


Much of statistics, especially in medicine and public health, is used to compare two or

more groups and attempting to figure out if the two groups are different from one another.

Using Inferential Statistics for Hypothesis Testing

• Make objective decisions about the outcome of their study

• Scientific hypothesis = what the researcher believes will be the

outcome of the study

• Null hypothesis = what can actually be tested by the statistical methods.

• Inferential statistics use the null hypothesis to test the validity of

a scientific hypothesis

Hypothesis Testing Example

• Scientific hypothesis:

Birthweight is different for babies of white mothers compared to black mothers

• Null hypothesis:

Birthweight for babies of white mothers =

Birthweight for babies of black mothers

Hypothesis Testing

General Framework


• Specify null & alternative hypotheses

• Specify test statistic

• State rejection rule (RR)

• Compute test statistic and compare to RR

• State conclusion

Hypothesis Testing

Specifying Hypotheses

• H0: “null” or no effect hypothesis

• HA: research or alternative hypothesis

Note: Only H0 (null) is tested.

NULL Hypothesis

=BirthweightW = BirthweightB

ALTERNATIVE Hypothesis

=BirthweightW ≠ BirthweightB

Null Hypothesis

Opposite of the “question" the researcher wishes to answer

• There will be no difference among the groups of study subjects

• We are trying to disprove (“reject”) our null hypothesis

• "If these samples came from the same population with regard to the outcome,

how likely is the obtained result?”

• Any observed differences in the dependent variable (outcome) must be due

to sampling error (chance)

• The independent (predictor) variable does NOT make a difference

I am what is

the default, the status quo.

I am already accepted; I can only be rejected.

The burden of proof is on the alternative.

I am the null hypothesis.

0/

…..=…..Red

50%

EXAMPLE: Null Hypothesis

Same chances of landing

on red as on blackSame chances of landing

on black as on red

Black

50%

Image Source: Flickr

State the Alternative Hypothesis

• HA : treatment level means not all equal

• At least one mean is different from all others

• Does not say which is different (if there are multiple groups)

• Does not say the direction of the difference (which is higher or lower)

…posits a relationship between variables and therefore is not a null hypothesis

This is what the researcher is expected to prove (HA)

EXAMPLE: Alternative Hypothesis (HA)

“Children taught by individual instruction will exhibit less mastery of

mathematical concepts than those taught by group instruction”

Set the Alpha

• α = probability of Type I Error

• Set a priori (before study begins)

• Typical value α = 0.05, establishing a 95% confidence level

• If study is conducted 100 times, decision to reject the null hypothesis (and accept

the alternative hypothesis) would be wrong 5 times out of 100 due to chance

alone

• In our birth weight example,

we would make a Type I error if we INCORRECTLY reject the null hypothesis

that birthweights are the same and say that there is a difference in birthweights

among babies of black and white mothers

Errors in Statistical Inference

Image Source: Flickr

Type I - α

• Researcher rejects a null hypothesis

when it is actually true

• False positive

• Considered more serious error in hypothesis testing

Type II – β (beta)

• Researcher accepts a null hypothesis that is actually false

• We “fail to reject” the null hypothesis even though

the alternative hypothesis is correct

• False negative

• Often occurs when sample is too small

Errors in Statistical Inference – Type II


Image Source:?????

Errors in Hypothesis Testing

• Standardized value that is calculated from sample data during a hypothesis test

• Used to determine whether to reject the null hypothesis

• Compares your data with what is expected under the null hypothesis.

• Used to calculate the p-value

Test Statistics


Image Source:?????

• Which variables (types of measurement) will help answer research question?

• Which is the dependent (outcome) variable and what type of variable is it?

• Which are the independent (explanatory) variables, how many are there and

what data types are they?

• Are relationships or differences between means of interest?

• Are there repeated measurements of the same variable for each subject?

Questions to Consider When Selecting Test Statistics


Common Test Statistics

Hypothesis test Test statistic

Z-test Z-statistic

t-test t-statistic

ANOVA F-statistic

Chi-square test Chi-square statistic

Different hypothesis tests use different test statistics based on the probability model

assumed in the null hypothesis.

Common tests and their test statistics include:


Image Source:?????

Birthweight Example

Difference between two means

(mean birthweight for babies of black mothers compared to white mothers)

Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96

Hypothesized Mean Difference 0

df 44

t Stat -2.63

P(T<=t) two-tail 0.01

t Critical two-tail 2.02Source: Basic Concepts and Methodology for the Health Sciences

Image Source:?????

Birthweight Example



Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96


df 44

t Stat -2.63


t Critical two-tail 2.02


HO: no difference between the means

Birthweight Example



Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96


df 44

t Stat -2.63




Sample sizes are different

Birthweight Example



Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96


df 44

t Stat -2.63




Means look different

Birthweight Example



Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96


df 44

t Stat -2.63




T-test statistic

Birthweight Example



Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96


df 44

t Stat -2.63




P-value is <0.05

Birthweight Example



Bwt_b Bwt_w

Mean 2719.7 3102.7

Variance 407917.1 529818.2

Observations 26 96


df 44

t Stat -2.63




We reject the null hypothesis

• Use the test statistic to find the p-value

o T test, F test, Z test, etc.

• Make a decision using the p-value

o P>0.05 = Fail to Reject the Null Hypothesis

o P<0.05 = Reject the Null Hypothesis

• Accept the Alternative (or research) hypothesis

Make a Decision and Interpret the Results


Birthweight Example: Interpretation

• We reject the null hypothesis and accept the alternative hypothesis.

• There is a statistically significant difference (p<0.05) in birthweight among

babies born to black mothers and babies born to white mothers in our sample

• If we believe that our sample accurately represents the population,

we can generalize these results to the larger population

We reject the null hypothesis

Summary: Steps in Hypothesis Testing

• State null hypotheses

• State alternative (or research) hypotheses

• Select/set alpha

• Specify/compute the test statistic

• Make a decision and interpret the results

MakeSpecifySelectStateState

• Parametric

• Nonparametric

Specify/Compute the Test Statistic


Determine the appropriate test statistic for your data

• Variable is normally distributed in the overall population

• Not based on the estimation of population parameters

o Requires measurement on at least an interval scale

o Involves certain assumptions about variables being studied

• More powerful and more flexible

Parametric Test Statistics

Common Parametric Statistical Tests

Parametric Test Statistics Uses

Source: http://www.statstutor.ac.uk/resources/uploaded/tutorsquickguidetostatistics.pdf

Nonparametric Test Statistics

• Most nonparametric tests about the population center are tests about median instead of mean

• The test does not answer the same question as the corresponding parametric procedureImage Source: iStock

Example

• Estimation of a population parameter

• Distribution is skewed (not normal)

• Variable measured on a nominal or ordinal scale

• Less powerful than corresponding tests

• Less likely to reject the null hypothesis when it is false

• Often require you to modify the hypotheses

Common Nonparametric Statistics


• Chi-square- used when data is at the nominal level

o Determine difference between groups

• Fisher’s exact probability

o Robust and used with small samples

Parametric test

• 1-sample Z-test, 1-sample t-test

• 1-sample Z-test, 1-sample t-test

• 2-sample t-test

• One-way ANOVA

• One-way ANOVA

• Two-way ANOVA

Alternative Nonparametric test

• 1-sample sign test

• 1-sample Wilcoxon test

• Mann-Whitney test

• Kruskal-Wallis test

• Mood's Median test

• Friedman test


Parametric Tests

& Nonparametric Alternatives

COMPARE:

Babies with Low Birthweights

• Proportion of babies born to women who smoked

during pregnancy

• Proportion of babies born to women who did not

smoke during pregnancy

Activity #1 Do smoking pregnant women increase the risk of low birthweight?


• Null Hypothesis (Ho)

• Alternative Hypothesis (Ha)

Activity #1

Write the Null & Alternative Hypotheses


• Researchers sampled 400 women who had smoked during their pregnancy

• They recorded the birth weight of the newborns

• Women who smoked had babies with lower birthweights than women in the

general population

• The p-value was 0.016.

What does this mean?

Activity #1

Assess the Evidence

Image Source: Unsplash

• P-value of the test is 0.016

• Very unlikely that we will observe these results if smoking does not increase

the risk of low birthweight (if H0 is true)

• Data provide enough evidence to reject the null hypothesis

• Proportion of low birthweight babies born to mothers who smoked during

their pregnancy is higher than overall proportion of low birthweight babies in

the population

Activity #1

Interpret the Results

Image Source: Unsplash

Module 1 – Data Sources

Module 2 – Types of Data

Module 3 – Descriptive Statistics

Module 4 – Inferential Statistics

Module 5 – Epidemiologic Concepts

Module 6 – Interpreting Data

Module 7 – Presenting Data

Module 8 – What Software to Use

Harnessing the Power of Data

You have just completed module 4 of the Data Analysis course. Please be sure

to complete all 9 modules, in order to receive Continuing Education Credits.

Module 9 – Summary with Q & A Session

Please contact me with your questions.

Jeanine Buchanich, PhD, MEdResearch Associate Professor – Biostatistics

[email protected]

Thanks for joining us!

This project is supported by the Health Resources and Services Administration (HRSA) of the U.S. Department of Health and Human Services (HHS) under grant number UB6HP27882 "Regional Public Health Training Center Program" for $3,420,000. This information or content and conclusions are those of the author and should not be construed as the official position or policy of, nor should any endorsements be inferred by HRSA, HHS, or the U.S. Government.

www.marphtc.pitt.edu

Harnessing the Power of Data - marphtc.pitt.edu · Source: Basic Concepts and Methodology for the...

Documents

Transcript of Harnessing the Power of Data - marphtc.pitt.edu · Source: Basic Concepts and Methodology for the...