Introduction to Biostatistics for Clinical and Translational Researchers

100
Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center FRONTIERS: The Heartland Institute of Clinical and Translational Research

description

Introduction to Biostatistics for Clinical and Translational Researchers. KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center FRONTIERS: The Heartland Institute of Clinical and Translational Research. Course Information. Jo A. Wick, PhD - PowerPoint PPT Presentation

Transcript of Introduction to Biostatistics for Clinical and Translational Researchers

Page 1: Introduction to Biostatistics for Clinical and Translational Researchers

Introduction to Biostatistics for Clinical and Translational

Researchers

KUMC Departments of Biostatistics & Internal MedicineUniversity of Kansas Cancer Center

FRONTIERS: The Heartland Institute of Clinical and Translational Research

Page 2: Introduction to Biostatistics for Clinical and Translational Researchers

Course InformationJo A. Wick, PhD

Office Location: 5028 RobinsonEmail: [email protected]

Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Events & Lectures’

Page 3: Introduction to Biostatistics for Clinical and Translational Researchers

ObjectivesUnderstand the role of statistics in the scientific

process and how it is a core component of evidence-based medicine

Understand features, strengths and limitations of descriptive, observational and experimental studies

Distinguish between association and causationUnderstand roles of chance, bias and

confounding in the evaluation of research

Page 4: Introduction to Biostatistics for Clinical and Translational Researchers

Course CalendarJuly 5: Introduction to Statistics: Core ConceptsJuly 12: Quality of Evidence: Considerations for

Design of Experiments and Evaluation of LiteratureJuly 19: Hypothesis Testing & Application of

Concepts to Common Clinical Research QuestionsJuly 26: (Cont.) Hypothesis Testing & Application

of Concepts to Common Clinical Research Questions

Page 5: Introduction to Biostatistics for Clinical and Translational Researchers

“No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”

Albert Einstein (1879-1955)

Page 6: Introduction to Biostatistics for Clinical and Translational Researchers

Vocabulary

Page 7: Introduction to Biostatistics for Clinical and Translational Researchers

Basic ConceptsStatistics is a collection of procedures and

principles for gathering data and analyzing information to help people make decisions when faced with uncertainty.

In research, we observe something about the real world. Then we must infer details about the phenomenon that produced what we observed.

A fundamental problem is that, very often, more than one phenomenon can give rise to the observations at hand!

Page 8: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Infertility

Suppose you are concerned about the difficulties some couples have in conceiving a child.It is thought that women exposed to a particular

toxin in their workplace have greater difficulty becoming pregnant compared to women who are not exposed to the toxin.

You conduct a study of such women, recording the time it takes to conceive.

Page 9: Introduction to Biostatistics for Clinical and Translational Researchers

Example: InfertilityOf course, there is natural variability in time-to-

pregnancy attributable to many causes aside from the toxin.

Nevertheless, suppose you finally determine that those females with the greatest exposure to the toxin had the most difficulty getting pregnant.

Page 10: Introduction to Biostatistics for Clinical and Translational Researchers

Example: InfertilityBut what if there is a variable you did not consider

that could be the cause?No study can consider every possibility.

Page 11: Introduction to Biostatistics for Clinical and Translational Researchers

Example: InfertilityIt turns out that women who smoke while they are

pregnant reduce the chance their daughters will be able to conceive because the toxins involved in smoking effect the eggs in the female fetus.

If you didn’t record whether or not the females had mothers who smoked when they were pregnant, you may draw the wrong conclusion about the industrial toxin.

Fertility

Natural Variability

Smoking Behaviors of Mother

Environmental Toxins

Page 12: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Infertility

Exposed to Toxin

Majority exposed to

smoke in womb

Prolonged time-to-

conceive found

Unexposed to Toxin

Majority unexposed to

smoke in womb

Time-to-conceive measured

??

Type I Error!

Lurking (Confounding) Variable → Bias

Page 13: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Infertility

Exposed to Toxin

Some smoking exposure

An insignificant change in time-to-

conceive found

Unexposed to Toxin

Some smoking exposure

Time-to-conceive measured

??

Type II Error!

Lurking (Confounding) Variable → “Noise”

Page 14: Introduction to Biostatistics for Clinical and Translational Researchers

The Role of StatisticsThe conclusions (inferences) we draw always come with

some amount of uncertainty due to these unobserved/unanticipated issues.

We must quantify that uncertainty in order to know how “good” our conclusions are.

This is the role that statistics plays in the scientific process. P-values (significance levels) Level of confidence Standard errors of estimates Confidence intervals Proper interpretation (association versus causation)

Page 15: Introduction to Biostatistics for Clinical and Translational Researchers

The Role of Statistics

Scientists use statistical inference to help model the uncertainty inherent in their investigations.

xxx

x

12

3

n

popula tion XS

goal: statistical inference(uncertainty measured by probability)

histogram(observation)

sample

(rea lity)

?

(imagination)po pula tio n model

Page 16: Introduction to Biostatistics for Clinical and Translational Researchers

Evidence-based Medicine

Evidence-based practice in medicine involvesgathering evidence in the form of scientific data.applying the scientific method to inform clinical

practice, establishment or development of new therapies, devices, programs or policies aimed at improving health.

Page 17: Introduction to Biostatistics for Clinical and Translational Researchers

Types of Evidence

Scientific evidence: “empirical evidence, gathered in accordance to the scientific method, which serves to support or counter a scientific theory or hypothesis”Type I: descriptive, epidemiologicalType II: intervention-based Type III: intervention- and context-based

Page 18: Introduction to Biostatistics for Clinical and Translational Researchers

Evidence-based MedicineEvidence-based practice results in a high

likelihood of successful patient outcomes and more efficient use of health care resources.

Page 19: Introduction to Biostatistics for Clinical and Translational Researchers

The Scientific Method

Revise

Experiment

Observe

Page 20: Introduction to Biostatistics for Clinical and Translational Researchers

Clinical Evaluation

Revise Design & Hypothe

sis

Run Experimen

t

Evidence (Data)

Page 21: Introduction to Biostatistics for Clinical and Translational Researchers

Types of StudiesPurpose of research

1) To explore2) To describe or classify3) To establish relationships4) To establish causality

Strategies for accomplishing these purposes:1) Naturalistic observation2) Case study3) Survey4) Quasi-experiment5) Experiment

Am

bigu

ity Control

Page 22: Introduction to Biostatistics for Clinical and Translational Researchers

Generating Evidence

Studies

Descriptive Studies

Populations Individuals

Case Reports

Case Series

Cross Sectiona

l

Analytic Studies

Observational

Case Control Cohort

Experimental

RCT

Complexity and Confidence

Page 23: Introduction to Biostatistics for Clinical and Translational Researchers

Observation versus ExperimentA designed experiment involves the investigator

assigning (preferably randomly) some or all conditions to subjects.

An observational study includes conditions that are observed, not assigned.

Page 24: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Heart StudyQuestion: How does serum total cholesterol vary

by age, gender, education, and use of blood pressure medication? Does smoking affect any of the associations?

Recruit n = 3000 subjects over two yearsTake blood samples and have subjects answer a

CVD risk factor surveyOutcome: Serum total cholesterolFactors: BP meds (observed, not assigned)Confounders?

Page 25: Introduction to Biostatistics for Clinical and Translational Researchers

Example: DiabetesQuestion: Will a new treatment help overweight

people with diabetes lose weight?N = 40 obese adults with Type II (non-insulin

dependent) diabetes (20 female/20 male)Randomized, double-blind, placebo-controlled

study of treatment versus placeboOutcome: Weight lossFactor: Treatment versus placebo

Page 26: Introduction to Biostatistics for Clinical and Translational Researchers

How to Talk to a Statistician?“It’s all Greek to me . . .”

Καλημέρα

Page 27: Introduction to Biostatistics for Clinical and Translational Researchers

Why Do I Need a Statistician?Planning a studyProposal writingData analysis and interpretationPresentation and manuscript development

Page 28: Introduction to Biostatistics for Clinical and Translational Researchers

When Should I Seek a Statistician’s Help?

Literature interpretationDefining the research questionsDeciding on data collection instrumentsDetermining appropriate study size

Page 29: Introduction to Biostatistics for Clinical and Translational Researchers

What Does the Statistician Need to Know?

General idea of the researchSpecific Aims and hypotheses would be ideal

What has been done beforeLiterature review!Outcomes under considerationStudy populationDrug/Intervention/Device

Rationale for the studyBudget constraints

Page 30: Introduction to Biostatistics for Clinical and Translational Researchers

“No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”

Albert Einstein (1879-1955)

Page 31: Introduction to Biostatistics for Clinical and Translational Researchers

VocabularyHypotheses: a statement of the research question

that sets forth the appropriate statistical evaluationNull hypothesis “H0”: statement of no differences or

association between variablesAlternative hypothesis “H1”: statement of differences

or association between variables

Page 32: Introduction to Biostatistics for Clinical and Translational Researchers

Disproving the NullIf someone claims that all swans are white,

confirmatory evidence (in the form of lots of white swans) cannot prove the assertion to be true.

Contradictory evidence (in the form of a single black swan) makes it clear the claim is invalid.

Page 33: Introduction to Biostatistics for Clinical and Translational Researchers

The Scientific MethodObservatio

n

Hypothesis

Experiment

Results

Evidence supports H

Evidence inconsistent

with H

Revise H

Page 34: Introduction to Biostatistics for Clinical and Translational Researchers

Hypothesis TestingBy hypothesizing that the mean response of a

population is 26.3, I am saying that I expect the mean of a sample drawn from that population to be ‘close to’ 26.3:

x

Px

24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0

Page 35: Introduction to Biostatistics for Clinical and Translational Researchers

Hypothesis TestingWhat if, in collecting data to test my hypothesis, I

observe a sample mean of 26?What conclusion might I draw?

x

Px

24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0

Page 36: Introduction to Biostatistics for Clinical and Translational Researchers

x

Px

24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0

Hypothesis TestingWhat if, in collecting data to test my hypothesis, I

observe a sample mean of 27.5? What conclusion might I draw?

Page 37: Introduction to Biostatistics for Clinical and Translational Researchers

x

Px

24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0

Hypothesis TestingWhat if, in collecting data to test my hypothesis, I

observe a sample mean of 30? What conclusion might I draw?

?

Page 38: Introduction to Biostatistics for Clinical and Translational Researchers

Hypothesis TestingIf the observed sample mean seems odd or

unlikely under the assumption that H0 is true, then we reject H0 in favor of H1.

We typically use the p-value as a measure of the strength of evidence against H0.

Page 39: Introduction to Biostatistics for Clinical and Translational Researchers

What is a P-value?

x

Px

24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0

Null distribution

Observed sample mean

p-value

A p-value is the area under the curve for values of the sample mean more extreme than what we observed in the sample we actually gathered.

If H1 states that the mean is greater than 26.3, the p-value is as shown.If H1 states that the mean is different

than 26.3, the p-value is twice the area shown, accounting for the area in both tails.

If H1 states that the mean is less than 26.3, the p-value is the area to the left of the observed sample mean.

A p-value the probability of getting a sample mean as favorable or more favorable to H1 than what was observed, assuming H0 is true.

The tail of the distribution it is in is determined by H1.

Page 40: Introduction to Biostatistics for Clinical and Translational Researchers

VocabularyOne-tailed hypothesis: outcome is expected in a

single direction (e.g., administration of experimental drug will result in a decrease in systolic BP)

Two-tailed hypothesis: the direction of the effect is unknown (e.g., experimental therapy will result in a different response rate than that of current standard of care)

Page 41: Introduction to Biostatistics for Clinical and Translational Researchers

VocabularyType I Error (α): a true H0 is incorrectly rejected

“An innocent man is proven GUILTY in a court of law”Commonly accepted rate is α = 0.05

Type II Error (β): failing to reject a false H0

“A guilty man is proven NOT GUILTY in a court of law”Commonly accepted rate is β = 0.2

Power (1 – β): correctly rejecting a false H0

“Justice has been served”Commonly accepted rate is 1 – β = 0.8

Page 42: Introduction to Biostatistics for Clinical and Translational Researchers

Decisions

ConclusionTruth

H1 H0

H1 Correct: Power Type I ErrorH0 Type II Error Correct

Page 43: Introduction to Biostatistics for Clinical and Translational Researchers

Statistical PowerPrimary factors that influence the power of your

study:Effect size: as the magnitude of the difference you wish

to find increases, the power of your study will increaseVariability of the outcome measure: as the variability

of your outcome decreases, the power of your study will increase

Sample size: as the size of your sample increases, the power of your study will increase

Page 44: Introduction to Biostatistics for Clinical and Translational Researchers

Statistical PowerSecondary factors that influence the power of your

study:DropoutsNuisance variationConfounding variablesMultiple hypothesesPost-hoc hypotheses

Page 45: Introduction to Biostatistics for Clinical and Translational Researchers

Hypothesis TestingWe will cover these concepts more fully when we

discuss Hypothesis Testing and Quality of Evidence

Page 46: Introduction to Biostatistics for Clinical and Translational Researchers

Descriptive Statistics

Page 47: Introduction to Biostatistics for Clinical and Translational Researchers

Field of Statistics

Statistics

Descriptive Statistics

Methods for processing,

summarizing, presenting and describing data

Experimental Design

Techniques for planning and conducting

experiments

Inferential Statistics

Evaluation of the information

generated by an experiment or

through observation

Page 48: Introduction to Biostatistics for Clinical and Translational Researchers

Field of Statistics

Statistics

Descriptive

Graphical Numerical

Inferential

Estimation Hypothesis Testing

Experimental Design

Page 49: Introduction to Biostatistics for Clinical and Translational Researchers

Field of StatisticsDescriptive statistics

Summarizing and describing the dataUses numerical and graphical summaries to characterize

sample dataInferential statistics

Uses sample data to make conclusions about a broader range of individuals—a population—than just those who are observed (a sample)

The principal way to guarantee that the sample

population sample

Page 50: Introduction to Biostatistics for Clinical and Translational Researchers

Field of StatisticsExperimental Design

Formulation of hypothesesDetermination of experimental conditions,

measurements, and any extraneous conditions to be controlled

Specification of the number of subjects required and the population from which they will be sampled

Specification of the procedure for assigning subjects to experimental conditions

Determination of the statistical analysis that will be performed

Page 51: Introduction to Biostatistics for Clinical and Translational Researchers

Descriptive StatisticsDescriptive statistics is one branch of the field of

Statistics in which we use numerical and graphical summaries to describe a data set or distribution of observations.

Statistics

Descriptive

Graphs Statistics

Inferential

Hypothesis Testing

Interval Estimates

Page 52: Introduction to Biostatistics for Clinical and Translational Researchers

Types of DataAll data contains information.It is important to recognize that the hierarchy

implied in the level of measurement of a variable has an impact on (1) how we describe the variable data and (2) what statistical methods we use to analyze it.

Page 53: Introduction to Biostatistics for Clinical and Translational Researchers

Levels of MeasurementNominal: differenceOrdinal: difference, orderInterval: difference, order, equivalence of intervalsRatio: difference, order, equivalence of intervals,

absolute zero

discrete qualitative

continuous quantitative

Page 54: Introduction to Biostatistics for Clinical and Translational Researchers

Types of Data

NOMINAL

ORDINAL

INTERVAL

RATIO

Information increases

Page 55: Introduction to Biostatistics for Clinical and Translational Researchers

Ratio DataRatio measurements provide the most

information about an outcome.Different values imply difference in outcomes.

6 is different from 7.Order is implied.

6 is smaller than 7.

Page 56: Introduction to Biostatistics for Clinical and Translational Researchers

Ratio DataIntervals are equivalent.

The difference between 6 and 7 is the same as the difference between 101 and 102.

Zero indicates a lack of what is being measured.If item A weighs 0 ounces, it weighs nothing.

Page 57: Introduction to Biostatistics for Clinical and Translational Researchers

Ratio DataRatio measurements provide the most

information about an outcome.Can make statements like: “Person A (t = 10

minutes) took twice as long to complete a task as Person B (t = 5 minutes).”

This is the only type of measurement where statements of this nature can be made.

Examples: age, birth weight, follow-up time, time to complete a task, dose

Page 58: Introduction to Biostatistics for Clinical and Translational Researchers

Interval Data

Interval measurements are one step down on the “information” scale from ratio measurements.Difference and order are implied and intervals

are equivalent.BUT, zero no longer implies an absence of the

outcome.What is the interpretation of 0C? 0K?The Celsius and Fahrenheit scales of temperature are

interval measurements, Kelvin is a ratio measurement.

Page 59: Introduction to Biostatistics for Clinical and Translational Researchers

Interval Data

Interval measurements are one step down on the “information” scale from ratio measurements.You can tell what is better, and by how much, but

ratios don’t make sense due to the lack of a ‘starting point’ on the scale.60F is greater than 30F, but not twice as hot since 0F

doesn’t represent an absence of heat.Examples: temperature, dates

Page 60: Introduction to Biostatistics for Clinical and Translational Researchers

Ordinal DataOrdinal measurements are one step down on the

“information” scale from interval measurements.Difference and order are implied.BUT, intervals are no longer equivalent.

For instance, the differences in performance between the 1st and 2nd ranked teams in basketball isn’t necessary equivalent to the differences between the 2nd and 3rd ranked teams.

The ranking only implies that 1st is better than 2nd, 2nd is better than 3rd, and so on . . . but it doesn’t try to quantify the ‘betterness’ itself.

Page 61: Introduction to Biostatistics for Clinical and Translational Researchers

Ordinal Data

Ordinal measurements are one step down on the “information” scale from interval measurements.Examples: Highest level of education achieved,

tumor grading, survey questions (e.g., likert-scale quality of life)

Page 62: Introduction to Biostatistics for Clinical and Translational Researchers

Nominal Data

Nominal measurements collect the least amount of information about the outcome.Only difference is implied.Observations are classified into mutually

exclusive categories.Examples: Gender, ID numbers, pass/fail

response

Page 63: Introduction to Biostatistics for Clinical and Translational Researchers

Levels of MeasurementIt is important to recognize that the hierarchy

implied in the level of measurement of a variable has an impact on (1) how we describe the variable data and (2) what statistical methods we use to analyze it.

The levels are in increasing order of mathematical structure—meaning that more mathematical operations and relations are defined—and the higher levels are required in order to define some statistics.

Page 64: Introduction to Biostatistics for Clinical and Translational Researchers

Levels of MeasurementAt the lower levels, assumptions tend to be less

restrictive and the appropriate data analysis techniques tend to be less sensitive.

In general, it is desirable to have a higher level of measurement.

A summary of the appropriate statistical summaries and mathematical relations or operations is given in the next table.

Page 65: Introduction to Biostatistics for Clinical and Translational Researchers

Levels of MeasurementLevel Statistical Summary Mathematical

Relation/OperationNominal Mode one-to-one transformations

Ordinal Median monotonic transformations

Interval Mean, Standard Deviation positive linear transformations

Ratio Geometric Mean, Coefficient of Variation multiplication by c 0

We must know where an outcome falls on the measurement scale--this not only determines how we describe the data (descriptive statistics) but how we analyze it (inferential statistics).

Page 66: Introduction to Biostatistics for Clinical and Translational Researchers

Using Graphs to Describe DataNominal and ordinal measurements are discrete

and qualitative, even if they are represented numerically.Rank: 1, 2, 3Gender: male = 1, female = 0

We typically use frequencies, percentages, and proportions to describe how the data is distributed among the levels of a qualitative variable.

Bar and pie charts are even more useful.

Page 67: Introduction to Biostatistics for Clinical and Translational Researchers

Example: MyopiaA survey of n = 479 children found that those who

had slept with a nightlight or in a fully lit room before the age of 2 had a higher incidence of nearsightedness later in childhood.

No Myopia

Myopia High Myopia

Total

Darkness 155 (90%) 15 (9%) 2 (1%) 172 (100%)

Nightlight 153 (66%) 72 (31%) 7 (3%) 232 (100%)

Full Light 34 (45%) 26 (48%) 5 (7%) 75 (100%)Total 342 (71%) 123 (26%) 14 (3%) 479

(100%)

Page 68: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Myopia

Darkness

Nightlight

Full Light

0 10 20 30 40 50 60 70 80 90 100

High

Some

None

Page 69: Introduction to Biostatistics for Clinical and Translational Researchers

Example: MyopiaAs the amount of sleep time light increases, the

incidence of myopia increases. This study does not prove that sleeping with the

light causes myopia in more children.There may be some confounding factor that isn’t

measured or considered-possibly genetics.Children whose parents have myopia are more likely to

suffer from it themselves.It’s also possible that those parents are more likely to

provide light while their children are sleeping.

Page 70: Introduction to Biostatistics for Clinical and Translational Researchers

Example: NauseaHow many subjects experienced drug-related

nausea?

Nausea No Nausea0

2

4

6

8

10

12

0 mg 10 mg 20 mg 50 mg

Dose Nausea No Nausea 0 mg 0 9 10 mg 1 10 20 mg 3 10 50 mg 3 11

Page 71: Introduction to Biostatistics for Clinical and Translational Researchers

Example: NauseaWith unequal sample sizes across doses, it is

more meaningful to use percent rather than frequency.

Nausea No Nausea0%

10%20%30%40%50%60%70%80%90%

100%

0 mg 10 mg 20 mg 50 mg

Dose Nausea No Nausea 0 mg 0 (0%) 9 (100%) 10 mg 1 (9%) 10 (91%) 20 mg 3 (23%) 10 (77%) 50 mg 3 (21%) 11 (79%)

Page 72: Introduction to Biostatistics for Clinical and Translational Researchers

Bar & Pie ChartsRace PercentCaucasian 30African American 20Hispanic 17Asian American 13Native American 13Other 7

Caucasian

African AmericanHispanic

Asian American

Native American

Other

Ethnicity

Caucasian African American Hispanic Asian American Native American Other

05

1015

2025

30

Page 73: Introduction to Biostatistics for Clinical and Translational Researchers

Using Graphs to Describe DataInterval and Ratio variables are continuous and

quantitative and can be graphically and numerically represented with more sophisticated mathematical techniques.HeightSurvival Time

We typically use means, standard deviations, medians, and ranges to describe how the variables tend to behave.

Histograms and boxplots are even more useful.

Page 74: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Time-to-deathSuppose that we record the variable x = time-to-

death of n = 100 patients in a study. Time

x

Freq

uenc

y

0 5 10 15

010

2030

40

Page 75: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Time-to-deathWe can quickly observe several characteristics of

the data from the histogram:For most subjects, death occurred between 0 and 5

monthsFor a few subjects, death occurred past 15 months

From this picture, we may wish to identify the distinguishing characteristics of the individuals with unusually long times.

Page 76: Introduction to Biostatistics for Clinical and Translational Researchers

Example: WeightSuppose we record the weight in pounds of n =

100 subjects in a study.

-1 1.5Q IQR 3 1.5Q IQR

IQR

1Q 3Q2Q

**

outlier outlier

x

Page 77: Introduction to Biostatistics for Clinical and Translational Researchers

Example: Tooth GrowthBoxplots represent the

same information, but are more useful for comparing characteristics between several data sets.

Right: distributions of tooth growth for two supplements and three dose levels

Page 78: Introduction to Biostatistics for Clinical and Translational Researchers

Using Numbers to Describe DataNominal and ordinal measurements are discrete

and qualitative, even if they are represented numerically.Rank: 1, 2, 3Gender: male = 1, female = 0

Interval and Ratio variables are continuous and quantitative and can be graphically and numerically represented with more sophisticated mathematical techniques.HeightSurvival Time

Page 79: Introduction to Biostatistics for Clinical and Translational Researchers

Using Numbers to Describe DataNominal and ordinal measurements are

qualitative, even if they are represented numerically. We typically describe qualitative data using frequencies

and percentages in tables.Measures of central tendency and variability don’t

make as much sense with categorical data, though the mode can be reported.

Page 80: Introduction to Biostatistics for Clinical and Translational Researchers

Describing DataInterval and ratio measurements are quantitative.

When dealing with a quantitative measurements, we typically describe three aspects of its distribution.Central tendency: a single value around which data

tends to fall.Variability: a value that represents how scattered the

data is around that central value--large values are indicative of high scatter.

We also want to describe the shape of the distribution of the sample data values.

Page 81: Introduction to Biostatistics for Clinical and Translational Researchers

Central Tendency

location

Mean: arithmetic average of dataMedian: approximate middle of dataMode: most frequently occurring value

Page 82: Introduction to Biostatistics for Clinical and Translational Researchers

Central TendencyMode, Mo

The most frequently occurring value in the data set.May not exist or may not be uniquely defined.It is the only measure of central tendency that can be

used with nominal variables, but it is also meaningful for quantitative variables that are inherently discrete (e.g., performance of a task).

Its sampling stability is very low (i.e., it varies greatly from sample to sample).

Page 83: Introduction to Biostatistics for Clinical and Translational Researchers

Central Tendency: ModeHistogram of x

x

Den

sity

0 5 10 15

0.00

0.05

0.10

0.15

0.20

Mo

Page 84: Introduction to Biostatistics for Clinical and Translational Researchers

Central Tendency: Mode

Males

Females

0 2 4 6 8 10 12 14 16

Mo

Page 85: Introduction to Biostatistics for Clinical and Translational Researchers

Central TendencyMedian, M

The middle value (Q2, the 50th percentile) of the variable.It is appropriate for ordinal measures and for skewed

interval or ratio measures because it isn’t affected by extreme values.

It’s unaffected (robust to outliers) because it takes into account only the relative ordering and number of observations, not the magnitude of the observations themselves.

It has low sampling stability.

Page 86: Introduction to Biostatistics for Clinical and Translational Researchers

Example: MedianSuppose we have a set of observations:

1 2 2 4The median for this set is M = 2.

Now suppose we accidentally mismeasured the last observation:

1 2 2 9The median for this new set is still M = 2.

Page 87: Introduction to Biostatistics for Clinical and Translational Researchers

Central Tendency: MedianHistogram of x

x

Den

sity

0 5 10 15

0.00

0.05

0.10

0.15

0.20

Mo M

Page 88: Introduction to Biostatistics for Clinical and Translational Researchers

Central TendencyMean,

The arithmetic average of the variable x.It is the preferred measure for interval or ratio variables

with relatively symmetric observations.It has good sampling stability (e.g., it varies the least

from sample to sample), implying that it is better suited for making inferences about population parameters.

It is affected by extreme values because it takes into account the magnitude of every observation.

It can be thought of as the center of gravity of the variable’s distribution.

x

Page 89: Introduction to Biostatistics for Clinical and Translational Researchers

Example: MeanSuppose we have a set of observations:

1 2 2 4The median for this set is M = 2, the mean is

Now suppose we accidentally mismeasured the last observation:

1 2 2 9The median for this new set is still M = 2, but the

new mean is

2.25.x

3.5.x

Page 90: Introduction to Biostatistics for Clinical and Translational Researchers

Central Tendency: MedianHistogram of x

x

Den

sity

0 5 10 15

0.00

0.05

0.10

0.15

0.20

Mo M x

Page 91: Introduction to Biostatistics for Clinical and Translational Researchers

Variability

spread

Range: difference between min and max valuesStandard deviation: measures the spread of data about the mean, measured in the same units as the data

Page 92: Introduction to Biostatistics for Clinical and Translational Researchers

VariabilityMeasures of variability depict how similar

observations of a variable tend to be. Variability of a nominal or ordinal variable is

rarely summarized numerically.The more familiar measures of variability are

mathematical, requiring measurement to be of the interval or ratio scale.

Page 93: Introduction to Biostatistics for Clinical and Translational Researchers

VariabilityRange, R

The distance from the minimum to the maximum observation.

Easy to calculate.Influenced by extreme values (outliers).

1 2 3 4 10 R = 10 - 1 = 9 1 2 3 4 100 R = 100 - 1 = 99

Page 94: Introduction to Biostatistics for Clinical and Translational Researchers

VariabilityInterquartile Range, IQR

The distance from the 1st quartile (25th percentile) to the 3rd quartile (75th percentile), Q3 - Q1.

Unlike the range, IQR is not influenced by extreme values.

Page 95: Introduction to Biostatistics for Clinical and Translational Researchers

Variability: IQR

-1 1.5Q IQR 3 1.5Q IQR

IQR

1Q 3Q2Q

**

outlier outlier

x

Page 96: Introduction to Biostatistics for Clinical and Translational Researchers

VariabilityStandard deviation, s

Represents the average spread of the data around the mean.

Expressed in the same units as the data.“Average deviation” from the mean.

Page 97: Introduction to Biostatistics for Clinical and Translational Researchers

VariabilityVariance, s2

The standard deviation squared.“Average squared deviation” from the mean.

Page 98: Introduction to Biostatistics for Clinical and Translational Researchers

Shape

shape

Page 99: Introduction to Biostatistics for Clinical and Translational Researchers

Distribution Shapes

Page 100: Introduction to Biostatistics for Clinical and Translational Researchers

SummaryBasic Concepts

Definition and role of statisticsVocabulary lesson

• Brief introduction to Hypothesis Testing• Brief introduction to Design concepts

Descriptive StatisticsLevels of MeasurementGraphical summariesNumerical summaries

Next time: Study Design Considerations and Quality of Evidence