Research Study Design and Statistical Methods for Cardiology Nathan D. Wong, PhD, FACC Professor and...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Research Study Design and Statistical Methods for Cardiology Nathan D. Wong, PhD, FACC Professor and...
Research Study Design and Statistical Methods for
Cardiology
Nathan D. Wong, PhD, FACCProfessor and Director
Heart Disease Prevention ProgramDivision of Cardiology
University of California, Irvine
Why are papers rejected for publication? (The Top 11 Reasons)1. The study did not address an important scientific
issue
2. The study was not original
3. The study did not actually test the authors’ hypothesis
4. A different type of study should have been done
5. Practical difficulties led the authors to compromise on the original study protocol (e.g., recruitment, procedures)
Greenhalgh T, BMJ 1997; 15: 243-6
Reasons 6-11 for Paper Rejection
6. The sample size was too small7. The study was uncontrolled or inadequately
controlled8. The statistical analysis was incorrect or
inappropriate9. The authors drew unjustified conclusions from
the data10. There is a significant conflict of interest among
authors11. The paper is so badly written that it is
incomprehensible
Outline
• Elements of Designing a Research Protocol
• Selecting a Study Design – Which is best for answering your question?
• Selection and Classification of Study Variables (e.g., predictors and outcomes)
• Sample size and power considerations• Choice of statistical procedures for
different study designs
Nine Key Elements of a Research Study Protocol
• Background• Hypotheses• Clinical Relevance• Specific Aims / Objectives• Methodology• Power / Sample Size• Measures and Outcomes• Data Management• Statistical Methodology
(UCI-SOM Dean’s Scientific Review Committee: http://www.rgs.uci.edu/ora/rp/hrpp/deansscientificreview.htm)
Background
• A brief review of the problem to be studied and of related studies that generated the rationale and the central idea of the proposed study. Several pertinent references should be provided.
Was the study original?
• Few studies break entirely new ground
• Many studies add to the evidence base of earlier studies which may have had other or more limitations
• Meta-analyses depend on literature containing multiple studies addressing a question in a similar manner
Features Distinguishing New vs. Previous Studies
• Is the study in question bigger in sample size, or with longer-follow-up (e.g., adding to meta-analyses of previous studies)?
• Is methodology more rigorous (e.g., having addressed criticisms of previous ones)?
• Is the population studied different from that of previous studies (ages, gender, ethnic groups)?
• Does the new study address a clinical issue of sufficient importance so it is politically desirable even if not scientifically necessary?
Greenhalgh T, BMJ 1997; 315: 305-8
Hypotheses
• The problem/s stated in the Background may generate a primary hypothesis and possibly one or two secondary hypotheses.
• A hypothesis is often stated in the null – e.g., "No difference between treatments A and B" is anticipated, or "No association between X and Y exists".
• Alternatively, it can be stated according to what one expects e.g., “A will be more effective than B in reducing levels or symptoms of C", or “X will be associated with Y".
Clinical Relevance
• In the case of clinical studies, the potential value in the understanding, diagnosis, or management of a clinical condition or pathological state should be stated.
Specific Aims / Objectives
• This states what the study is intended to study or demonstrate and generally includes mention of predictor and outcome (or endpoint) variables.
• For example: "The primary aim of the study is to examine whether treatment A is more effective than treatment B in reducing levels of C", or "in finding out whether X is associated with Y", etc.
• There may be several specific aims in a given study. The methods of study should address each of them.
Elements of a Formulated Question
• Patient or Population: Who is the question about? (e.g., pts with diabetes mellitus)
• Intervention or Exposure: What is being done or what is happening to the patient/population? (e.g., tight control)
• Outcome(s): How does the intervention affect the patient/population (mortality, CHD incidence)
• Comparison(s): What could be done instead of the intervention? (e.g., standard management)
Methodology
• Methodology should validate or not validate the hypothesis and specific aims using procedures consistent with sound scientific study design including:– the size and nature of the subjects studied– recruitment, screening, and enrollment
procedures– inclusion and exclusion criteria– treatment schedules, and follow-up procedures, if
applicable. A chart of the studies to be performed at each visit and the time of each visit and test is needed.
Study Population Issues
• How were the subjects recruited? Is there potential recruitment bias (e.g., from taking respondents of advertisements), or is survey done in a random (e.g., random digit-dialing) or consecutive sample?
• Who was included? Many trials exclude those who have co-morbidities, do not speak English, or take other medications—may provide scientifically clean results, but may not be representative of disease in question.
Study Population (cont.)
• Who was excluded? Study may exclude those with more severe forms of disease, therefore limiting generalizibility
• Were subjects studied in “real-life” circumstances? Is the consenting process describing the benefits/risks, access to study staff, equipment available, etc. be similar to that in an ordinary practice situation?
Power / Sample Size
• A power/sample size analysis should include an estimate of minimum effect or difference expected at a given level of power when the sample size is fixed, or a projection of the number of subjects needed to achieve a clinically important difference in what is being examined in the hypotheses and the specific aims.
Measures and Outcomes
• Measures include both independent (predictor) and dependent (outcome) variables.
• Outcomes include what the investigator is trying to predict, e.g., new or recurrent onset of a disease state, survival, or lowering of cholesterol as a result of a drug.
• The independent or predictor variables should always include treatment status (e.g., active vs. placebo) in the case of a clinical trial, or primary variables of interest (such as age, gender, levels of X at baseline) for other studies. In either case, there will often be possible cofounders or covariates to adjust for in the analysis of the results.
• The measures and outcomes are reasonably expected to answer the proposed question and the importance of the knowledge expected to result from the research.
Data Management
• Data Management includes how data is captured for analysis and the tools that will be utilized while capturing the data. This includes:– Case report forms for clinical trials– Surveys, questionnaires, or interview
instruments– Computerized spreadsheets or entry forms– Methods for data entry, error checking, and
maintenance of study databases
Statistical Methods of Analysis
• Statistical analysis includes a description of the statistical tests planned to perform to examine the results obtained, e.g., – Student’s t-test will be used to compare levels
of A and B between treatment and placebo groups
– Multiple logistic regression analysis will be used to examine an independent treatment effect on the likelihood of recurrent disease.
Hierarchy of Evidence (for making decisions about clinical interventions or proving causation)
1. Systematic reviews and meta-analyses2. Randomized controlled trials with definitive and
clinically significant effects3. Randomized controlled trials with non-
definitive results4. Cohort studies5. Case-control studies6. Cross-sectional surveys7. Case reports
Features Affecting Strength and Generalizability of Study
• sample size• selection of comparison group (control or placebo)• selection of study sample (is it representative of
population the study results are intended to apply to?)• length of time of follow-up• outcome assessed (e.g., hard vs. soft or surrogate
endpoint)• Measurement and ability to control for potential
confounders
Case Reports and Series
• Provides “anectdotal” evidence about a treatment or adverse reaction
• Often with significant detail not available in other study designs
• May generate hypotheses, help in designing a clinical trial.
• Several reports forming a “case series” can help establish efficacy of a drug, or thru adverse reports, cause its demise (example: Cerivastatin fatal cases of rhabdomyolysis).
Observational Studies
• Cross-sectional, prospective, and case-control studies seldom can identify two groups of subjects (exposed vs. unexposed or cases vs. controls) that are similar (e.g., in demographic or other risk factors).
• Much of the controlling for baseline and/or follow-up differences in subject characteristics occurs in the analysis stage (e.g., multivariable analysis as in Framingham)
Observational Studies (cont.)
• While statistical procedures may be done correctly, have we considered all possible confounders?
• Some covariates may not have been measured as accurately as possible, and more often, may not be even known or measured.
Observational, cross-sectional
• Examines association between two factors (e.g, an exposure and a disease state) assessed at a single point in time, or when temporal relation is unknown
• Example: Prevalence of a known condition, association of risk factors with prevalent disease.
• Conclusions: Associations found may suggest hypotheses to be further tested, but are far from conclusive in proving causation
Cross-Sectional Studies and Surveys
• Examples: NHANES III, CHIS (telephone), chart-review studies
• Surveys should include a representative, ideally randomly-chosen (rather than a small sample of approached subjects who actually agree to be surveyed) sample.
• Data collected cannot assume any directionality in exposure / disease.
• Can statistically adjust for confounders, but difficult to establish the temporal nature of exposure and disease.
Prevalence of CHD by the Metabolic Syndrome and Diabetes in the NHANES Population Age 50+
CH
D P
revale
nce
% of Population =
No MS/No No MS/No DMDM
54.2%54.2%
MS/No DMMS/No DM28.7%28.7%
DM/No MSDM/No MS2.3%2.3%
DM/MSDM/MS14.8%14.8%
8.7%
13.9%
7.5%
19.2%
0%
5%
10%
15%
20%
25%
Alexander CM et al. Diabetes 2003;52:1210-1214..
Odds of CVD Stratified by CRP Levels in U.S. Persons (Malik and Wong et al., Diabetes Care, 2005)
–*p<.05, **p<.01, **** p<.0001 compared to no disease, low CRP
–CRP categories: >3 mg/l (High) and <3 mg/L (Low)
–age, gender, and risk-factor adjusted logistic regression (n=6497)age, gender, and risk-factor adjusted logistic regression (n=6497)
Nodisease Metabolic
Syndrome Diabetes
Low CRP
High CRP0
1
2
3
4
5
6
**
***
**
***Odds
Rat io
Metabolic Syndrome Independently Associated with Inducible Ischemia from SPECT
(Wong ND et al., Diabetes Care 2005; 28: 1445-50 )
Predictor OR 95% CI P valueLog coronary calcium (per SD)
4.11 2.60-6.51 <0.001
Chest Pain Symp 2.94 1.69-5.09 <0.001
1-2 MetS risk factors 2.99 0.70-12.8 0.14
3 MetS risk factors 4.80 1.01-22.9 0.049
4-5 MetS risk factors 10.93 2.09-57.2 0.005
Diabetes 4.55 0.98-21.1 0.053
*Estimates adjusted for age, gender, cholesterol and smoking. Odds of ischemia for metabolic abnormalities (yes vs. no) (separate model): 1.98 (1.20-3.98), p=0.008
Prospective (Cohort) Studies
• Cohort studies begin with identification of a population, assessment of exposure (e.g., lipid or BP levels)
• Follow-up to the occurrence of outcomes (CHD events)-- temporal sequence to events is known
Cohort Studies (cont.)
• Difficult to ascertain effect of exposure because of many differences between exposed and unexposed groups (confounding factors).
• Statistical adjustment for known risk factor differences can help, but unknown factors that may differ between exposed and unexposed groups will never be adjusted for.
Duration of Follow-up
• Is the planned follow-up reasonable and practical for the study question and sample size utilized?
– effect of a new painkiller on degree of pain relief may only require 48 hours
– effect of a cholesterol medication on mortality may require 5 years)
Prospective cohort studies• Examples:
– Framingham Heart Study– Cardiovascular Health Study (CHS)– Multiethnic Study of Atherosclerosis (MESA)– Nurses Health Study
• Advantages: – large sample size– ability to follow persons from healthy to diseased
states– temporal relation between risk factor measures and
development of disease
Prospective Studies (cont.)
• Disadvantages: – expensive due to large sample size often
needed to accrue enough events– many years to development of disease– possible attrition– causal inference not definitive as difficult to
consider all potential confounders
Prospective Cohort Example: Framingham Heart Study
• Longest running epidemiologic study• Began with 5209 persons aged 30-62 at
baseline in 1948, studied biennially to date (most are deceased now)
• Risk factors measured at each examination, some began later (e.g., HDL-C around 1970) or done only at certain exams (echocardiography, CRP)
• Event ascertainment/adjudication involves panel of 3 physicians reviewing medical records
Low HDL-C Levels Increase CHD Risk Even When Total-C Is Normal
(Framingham)
Risk of CHD by HDL-C and Total-C levels; aged 48–83 yCastelli WP et al. JAMA 1986;256:2835–2838
02468
101214
< 40 40–49 50–59 60< 200
230–259200–229
260
HDL-C (mg/dL) Tota
l-C (m
g/dL
)
14
-y in
cid
en
ce
rate
s (%
) fo
r C
HD
11.24
11.91
12.50
11.91
6.56
4.67
9.05
5.53
4.85
4.15
3.77
2.782.0
6
3.83
10.7
6.6
4-Year Progression To Hypertension: The Framingham Heart Study
5
18
37
0
10
20
30
40
50
Optimal Normal High-Normal
Pat
ien
ts (
%)
(<120/80 mm Hg)
(130/85 mm Hg) (130-139/85-89 mm
Hg)Vasan, et al. Lancet 2001;358:1682-86
Participants age 36 and older
CHD, CVD, and Total Mortality: US Men and Women Ages 30-74
(age, gender, and risk-factor adjusted Cox regression) NHANES II Follow-Up (n=6255)(Malik and Wong, et al., Circulation 2004; 110: 1245-
1250)
0
1
2
3
4
5
6
7
CHD Mortality CVD Mortality Total Mortality
None
MetS
Diabetes
CVD
CVD+Diabetes
* p<.05, ** p<.01, **** p<.0001 compared to none
*
***
***
***
**
***
***
***
******
***
1.00
0.99
0.98
0.97
0.96
0.000 2 4 6 8
Years of Follow-up
Low CRP-low LDL
Low CRP-high LDL
High CRP-low LDL
High CRP-high LDL
CV Event-Free 8-year Survival Using Combined hs-CRP and LDL-C
Measurements (n=27,939)
Ridker et al, N Engl J Med. 2002;347:1157-1165.
Pro
bab
ility
of
Eve
nt-
free
Su
rviv
al
Median LDL 124 mg/dlMedian CRP 1.5mg/l
Case-control Studies• Most frequent type of epidemiologic study, can be
carried out in a shorter time and require a smaller sample size, so are less expensive
• Only practical approach for identifying risk factors for rare diseases (where follow-up of a large sample for occurrence of the condition would be impractical)
• Selection of appropriately matched control group (e.g., hospital vs. healthy community controls) and consideration of possible confounders crucial
• Relies on historical information to obtain exposure status (and information on confounders)
Case-Control Studies (cont.)
• Cannot determine for sure whether exposure preceded development of disease
• Also difficult to identify all differences between cases and controls that can be statistically adjusted for
Example of case-control study: Folate and B6 intake and risk of MI (Tavani et al. Eur J Clin Nutr 2004)• Cases were 507 patients with a first episode of
nonfatal AMI, and controls were 478 patients admitted to hospital for acute conditions
• Information was collected by interviewer-administered questionnaires
• Compared to patients in the lowest tertile of intake, the ORs for those in the highest tertile were 0.56 (95% CI 0.35-0.88) for folate and 0.34 (95% CI 0.19-0.60) for vitamin B6.
• Author conclusion: A high intake of folates, vitamin B6 and their combination is inversely associated with AMI risk
Potential sources of bias and error in case control studies
• Information on the potential risk factor or confounding variables may not be available from records or subjects’ memories
• Cases may search for a cause of their disease and be more likely to report an exposure than controls (recall bias)
• Uncertainty as to whether agent caused disease or whether occurrence of the disease caused the person to be exposed to the agent
• Difficulty in assembling a case group representative of all cases, and/or assembling an appropriate control group
Prospective, observational: nested case-control
• In this design, one takes incident cases (e.g., incident CVD) and a matched set of controls to examine the association of a risk factor measured sometime before development of the outcome of interest
• Less costly than a true prospective design where all subjects are included in analysis; may not provide equivalent estimates
Prospective study of CRP and risk of future CVD events among apparently healthy
women (Ridker et al., Circulation 1998) – a nested case control study
• 122 female pts who suffered a first CVD event and 244 age and smoking-matched controls free of CVD
• Logistic regression estimated relative risks and 95% CI’s, adjusted for BMI, diabetes, HTN, hypercholesterolemia, exercise, family hx, and trt
• Those who developed CVD events had higher baseline CRP than controls; those in the highest quartile of CRP had a 4.8-fold (4.1 adjusted) increased risk of any vascular event. For MI or stroke, RR=7.3 (5.5 adjusted)
hs-CRP Adds to Predictive Value of TC:HDL Ratio in Determining Risk of First
MI
0.0
1.0
2.0
3.0
4.0
5.0
High Medium Low Low
Medium
High
Total Cholesterol:HDL RatioTotal Cholesterol:HDL Ratio
Ridker et al, Circulation. 1998;97:2007–2011.
hs-CRP
hs-CRP
Rel
ativ
e R
isk
Rel
ativ
e R
isk
Examples where observational studies have taken us down the
wrong path……• Meta-analysis of observational studies have
shown a 50% lower risk of CHD among estrogen users vs. non-users (which may have had many unknown differences that were not adjusted for), but recently randomized trials (HERS, WHI) show no benefit
• Numerous prospective studies show a 25-50% lower risk of CHD among those taking vitamin E and other antoxidants vs. placebo– recent randomized trials (e.g., HOPE, HPS) show no benefit.
Randomized Clinical Trial
• Considered the gold standard in proving causation– e.g., by “reducing” putative risk factor of interest
• Randomization “equalizes” known and unknown confounders/covariates so that results can be attributed to treatment with reasonable confidence
• Inclusion and exclusion criteria can often be strict (to maximize success of trial) and may require screening numerous patients for each patient randomized
Randomized Clinical Trials (2)
• Expensive, labor intensive, attrition from loss to follow-up or poor compliance can jeopardize results, esp. if more than outcome difference between groups
• Conditions are highly controlled and may not reflect clinical practice or the real world
• Funding source of study and commercial interests of investigators can raise questions about conclusions of study
Randomized Controlled Trials (3)
• Randomized controlled trial eliminates systematic bias (in theory) by allocating treatments among participants in a random fashion
• The allocation process eliminates selection bias in group characteristics (check comparability of baseline characteristics such as age, gender, severity of disease and covariate risk factors) (selection bias)
RCT’s (4)
• Need to check for any biases in treatments or care provided between the groups (performance bias)
• Need to check for differences in follow-up and withdrawals between the groups– large differences in loss to follow-up can compromise validity of trial (exclusion bias)
• Need to check for any differences in how the outcomes were ascertained between the groups (detection bias)
Advantages of RCT’s
– Allows rigorous evaluation of a single intervention in a well-defined population
– Prospective design (events occur after the intervention)
– Presumably eradicates bias by comparing two identical groups (but see below)
– Allows for meta-analysis
Disadvantages of RCT’s
• Expensive and time-consuming• Often performed on too few patients, or
undertaken for too short a period• Often funded by large research bodies or
pharmaceutical companies which dictate the research agenda
• Often involves many inclusion and exclusion criteria to recruit those who will respond to intervention, thus limiting generalizibility to a more general patient population.
Completeness of Follow-up
–Conclusions of study can be at jeopardy if there are more unknown subjects lost to follow-up than explain the differences in outcome.
–Ignoring those withdrawals will often bias results in favor of the intervention, so standard to analyze results on an “intention-to-treat” basis, including all who were originally randomized.
Follow-up (cont.)
– Patient withdrawal may be caused by:• Incorrect entry of patient into a trial• Suspected adverse reaction to a drug
(although many drug AE’s are similar to placebo AE’s)
• Loss of patient motivation• Withdrawal by clinician for clinical reasons• Loss to follow-up• Death
Non-randomized Controlled Trials• Treatment intervention may be applied in one
group of patients (hospitalized), and “control” intervention in a separate group of patients from another source (outpatient clinic)
• May be done when randomization is unethical or inappropriate (e.g., trial examining exposure to cigarette smoking)
• Need to check for any self-selection biases—are there any baseline differences between the two groups that could invalidate the effects of the intervention? (e.g., treated group could have more severe confounding risk factors)
Statistics and Statistical Procedures for Cross-Sectional
and Case-Control Designs– When both independent and dependent
variables are continuous: Pearson correlation or linear/polynomial regression
– When dependent variable is continuous and independent variables are categorical (with or without continuous or categorical covariates)
Analysis of variance (Analysis of covariance with covariates).
Analysis for Cross-Sectional and Case Control Designs (cont.)
– When both independent and dependent variables are categorical: Chi-square test of proportions- prevalence odds ratio for likelihood of factor Y in those with vs. w/o factor X.
– When outcome is binary (e.g., survival) and explanatory variables are categorical and/or continuous:
• Student-test or Chi-square for initial analysis• Logistic regression (multiple logistic regression for
covariate adjustment)
Wong et al. JACC 2003; 41: 1547-53.
Malik and Wong et al., Diabetes Care 2005; 28: 690-3
Malik and Wong et al., Diabetes Care 2005; 28: 690-3
Likelihood of CVD by Metabolic Syndrome, Diabetes, and CRP Levels
Statistical Procedures for Prospective Cohort Studies
• When outcome is continuous: Linear and/or polynomial regression
• When outcome is binary: Relative risk (RR) for incidence of disease in those with vs. without risk factor of interest, adjusted for covariates and considering follow-up time to event--Cox proportional hazards regression: HR (t,zi) = HR0 (t) exp (α’zi)
• If follow-up time is not known, use logistic regression: p (Y=1 | r1,r2,…) = 1/(1+ exp[-a-b1r1-… bnrn)
Total Mortality Rates in U.S. Adults, Age 30-75, with Metabolic Syndrome (MetS), With and Without Diabetes Mellitus and Pre-Existing CVDNHANES II: 1976-80 Follow-up Study**
Source: Malik and Wong et al., Circulation 2004;110:1245-50.
7.8 8.6
17.0
28.1
44.1
2.65.3
14.4
4.3
17.1
4.8
21.1
6.3
11.5
26.1
10.9
16.7
30.0
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
CHD Mortality CVD Mortality Total Mortality
Dea
ths/
1000
Per
son
Yea
rs
No MetS or DM
MetS w/o DM
MetS w/DM
DM only
Prior CVD
Prior CVD and DM
** Average of 13 years of follow-up.
Malik and Wong et al., Circulation 2004; 110: 1239-44
Statistics and Statistical Procedures forRandomized Clinical Trials
Relative risk (RR) of binary event occurring in intervention vs. control group: - when follow-up time is known and
varies, use Cox PH regression, where RR= ebeta for the trt var.
-- when follow-up time is uniform or unknown, use logistic regression
Statistics and Statistical Procedures for Randomized Clinical Trials (cont.)
For continuously measured outcomes, (e.g., changes in blood pressure):• Pre-post differences in a single group examined by
paired t-test• Treatment vs. control differences examined by
Student’s T-test (ANCOVA used when adjusting for covariates)
• repeated measures ANOVA / ANCOVA used for multiple measures across a treatment period and covariates
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
LaRosa et al., N Engl J Med 2005; 352
Questions to ask regarding study results
• How large is the treatment effect (or likelihood of outcome)?– Relative risk reduction (may obscure comparative
absolute risks)– Absolute risk reduction: is this clinically significant?
• How precise is the treatment effect (or likelihood of outcome)?– What are the confidence intervals?– Do they exclude the null value? (e.g., is the result statistically significant– magnitude
of Chi-square or F-value)
MRC/BHF Heart Protection Study (HPS): Eligibility
• Age 40–80 years
• Increased risk of CHD death due to prior disease
– Myocardial infarction or other coronary heart disease
– Occlusive disease of noncoronary arteries
– Diabetes mellitus or treated hypertension
• Total cholesterol > 3.5 mmol/L (> 135 mg/dL)
• Statin or vitamins not considered clearly indicated or contraindicated by patient’s own doctors
Heart Protection Study Group. Lancet. 2002;360:7-22.
HPS: First Major Coronary Event
0.4 0.6 0.8 1.0 1.2 1.4
Nonfatal MI
Coronary death
Subtotal: MCE
Coronary
Noncoronary
Subtotal: any RV
Any MVE
Coronary events
Revascularizations
Type of Major Vascular Event
Statin-Allocated
(n = 10269)
Placebo-Allocated
(n = 10267)
357 (3.5%) 574 (5.6%)
587 (5.7%) 707 (6.9%)
898 (8.7%) 1212 (11.8%)
513 (5.0%) 725 (7.1%)
450 (4.4%) 532 (5.2%)
939 (9.1%) 1205 (11.7%)
2033 (19.8%) 2585 (25.2%)
0.73 (0.670.79)P < 0.0001
0.76 (0.700.83)P < 0.0001
0.76 (0.720.81)P < 0.0001
Statin Better Placebo Better
Heart Protection Study Collaborative Group. Lancet. 2002;360:722.
These results from the Heart Protection Study frequently present a relative risk reduction of 24% (or relative risk of 0.76), but an absolute risk reduction of only 5.5% associated with the simvastatin treatment.
Relative vs. Absolute Risk: The Example from The Women’s Health
Initiative• Those randomized to estrogen/progestin
compared to placebo and statistically significant increased risks:– Breast cancer 26% (8/10,000 person
years)– Total coronary heart disease 29%
(7/10,000 person years)– Stroke 41% (8/10,000 person years)– Pulmonary embolism 2.1 X (8/10,000
person years)– Protective for colorectal cancer (37%
lower) and hip fracture (34% lower): no effect endometrial cancer or total mortality
Examining Magnitude of Effect: HPS Study Example of Vascular Event Reduction
Event Yes Event No
Simvastatin/ Treatment
a
2042
b
8227
Placebo / Control
c
2606
d
7661
Control event rate (CER) = c/c+d = 2606/10267=0.254
Experimental event rate (EER) = a/a+b = 2042/10269 = 0.199
Relative Risk (RR) = EER/CER = (.199)/(.254) = 0.78
Relative Risk Reduction (RRR) = CER-EER/CER=(0.254-0.199)/.254= 0.22
Absolute Risk Reduction (ARR) = CER-EER = 0.01 – 0.008 = 0.055, or 5.5%
Number Needed to Treat = 1/ARR = 1/0.055 = 18.2 (or 56 events prevented per 1000 treated)
SUMMARY: Statistics and Statistical Procedures
• Cross-sectional: Pearson correlation, Chi-square test of proportions- prevalence odds ratio for likelihood of factor Y in those with vs. w/o X
• Case-control: Odds ratio for likelihood of exposure in diseased vs. non-diseased-- Chi-square test of proportions / logistic regression
• Prospective: Relative risk (RR) for incidence of disease in those with vs. without risk factor of interest, adjusted for covariates and considering follow-up time to event--Cox PH regression. Correlations and linear/ transformed regression used for continuous outcomes.
SUMMARY: Statistics and Statistical Procedures (continued)
• Randomized clinical trial: Relative risk (RR) of event occurring in intervention vs. control group - Cox PH regression– For continuously measured outcomes, such as
pre-post changes in risk factors (lipids, blood pressure, etc.) initial treatment vs. control differences examined by Student’s T-test, repeated measures ANOVA / ANCOVA used for multiple measures across a treatment period and covariates
Data Collection / Management • Always have a clear plan on how to collect data--
design and pilot questionnaires, case report forms.• The medical record should only serve as source
documentation to back up what you have coded on your forms
• Use acceptable error checking data entry screens or spreadsheet software (e.g., EXCEL) that is covertable into a statistical package (SAS highly recommended and avail via UCI site license)
• Carefully design the structure of your database (e.g, one subject/ record, study variables in columns) so convertible into an analyzable format
Data Collection / Management
• Always have a clear plan on how to collect data-- design and pilot questionnaires, case report forms.
• The medical record should only serve as source documentation to back up what you have coded on your forms
Data Collection / Management (cont.)
• Use acceptable error checking data entry screens or spreadsheet software (e.g., EXCEL) that is convertible into a statistical package (SAS highly recommended and avail via UCI site license)
• Carefully design the structure of your database (e.g, one subject/ record, study variables in columns, numeric coding of all variables) so easily convertible for statistical analysis
Critical Appraisal
1. Why was the study done, and what clinical question is being asked? (a brief background, review of the literature, and aim / hypothesis should be stated)
2. What type of study was done? (experiment, clinical trial, observational cohort or cross-sectional study, or survey)
Critical Appraisal (cont.)3. Was the design appropriate for the research?
• Clinical trial preferred to test efficacy of treatments (e.g., HPS simvastatin trial)
• Cross-sectional study preferred for testing validity of diagnostic/screening tests or risk factor associations (e.g., NHANES III)
• Longitudinal cohort study preferred for prognostic studies (e.g., Framingham)
• Case-control study best to examine effects of a given agent in relation to occurrence of an illness, esp. rare illnesses (e.g., cancer)
Questions to Ask Regarding Study Design and Performance
• Was assignment of patients to treatments randomized?
• Were all patients who entered the trial accounted for?
• Was follow-up sufficiently long and complete?• Were patients analyzed in the groups to which
they were randomized (intent to treat)?• Were patients, health workers, and study
personnel “blinded” to treatment assignment?
Questions to Ask Regarding Study Design and Performance (cont.)
• Were groups similar (or study sample representative of population) at start of the trial? (selection bias)
• Aside from experimental intervention, were the groups treated equally? (performance bias)
• Were objective and unbiased outcome criteria used? (detection bias)
Questions to Ask Regarding Statistical Analysis
• Was there sufficient power/sample size?• Was the choice of statistical analysis
appropriate?• Was the choice (and coding/classification) of
outcome and treatment variables appropriate?• Is there an adequate description of magnitude
and precision of effect?• Was there adjustment for potential confounders?
Will the results help me in caring for my patients?
For a study evaluating therapy:– Can the results be applied to my patient care?
(was the study or meta-analysis large enough with adequate precision?)
– Were all clinically important treatment outcomes considered? (were secondary outcomes and adverse events assessed?)
– Are the likely treatment benefits worth the potential harms and costs? (does the absolute benefit outweight the risk of adverse events and cost of therapy?)
Will the results help me in caring for my patients (cont.)?
For a study evaluating prognosis:– Were the study patients similar to my own?
(demographically representative, stage of disease)
– Will the results lead directly to selecting or avoiding therapy? (useful to know clinical course of pts.)
– Are the results useful for reassuring or counseling patients? (a valid, precise result of a good prognosis is useful in this case)
Measures of Precision of Effect• The p-value, or alpha error most commonly indicates the
precision of the result, with a low p-value corresponding to a precise result.
• A t-statistic, Chi-square, or r-square value gives the relative magnitude of a relation.
• An F-statistic (or multiple r-square) identifies the magnitude of the variance in the dependent variable explained by the treatment or explanatory variable(s)
• A Wald or Likelihood Ratio Chi-square statistic is frequently used in logistic or Cox regression survival analysis.
• The higher the magnitude of the above statistics, the more precise or stronger is the relationship between the explanatory variable (s) and the outcome of interest.
Precision of Effect: The Confidence Interval
• The estimate of where the true value of a result lies is expressed within 95% confidence intervals, which will contain the true relative risk or odds ratio 95% of the time – corresponds to 2-tailed alpha=0.05 where the null result value is excluded (e.g., RR=1.0 is excluded)
• 95% Confidence intervals are the RR + 1.96 X SE (since SE is SD/ sqrt(N), confidence intervals are smallest (precision greatest) with larger studies.
• 95% CI of the ARR is + 1.96 X square root of ([CER X (1-CER)/# control patients + EER X (1-EER)/#
of exp’l patients]• 95% CI for NNT = 1 / [95% CI for ARR]
Where to Go for Help
• Epidemiology and statistics books
• Statistical Consulting Center
• Dean’s Scientific Review Committee - considers appropriateness of research design, procedures, statistical considerations for UCI-COM investigator initiated studies
Sample Size Considerations
• What level of difference between the two groups constitutes a clinically significant effect one wishes to detect? (e.g., difference in mean SBP response or difference in treatment vs. control incidence rates of CHD or relative risk; if continuous outcome, know mean and SD.
Guidelines for Sample Size / Power Determination
• Necessary for any research grant application
• Need to estimate what “control group” rate of disease or outcome is
• Need to state what is minimum difference (effect size) you want to detect that is clinically significant--e.g., difference in rates, or risk ratio
• Either power can be estimated for a fixed sample size at fixed alpha (usually 0.05 two-tailed) for different effect, OR sample size can be estimated for a given power (usually 0.80) for different effect sizes
Statistical significance and power
• Statistical significance is based on the Type I or Alpha error– the probability of rejecting the null hypothesis
when it was true (saying there was a relationship when there isn’t one)
– usually we accept being wrong <5% of the time, or alpha=0.05
– Setting alpha depends on how important it is that we not make a mistake in our conclusion.
• The Type II or Beta error is the probability of accepting the null when it was false– saying there is no relationship when there is one– power is 1-B, and 80% or 90% (beta error of 10%
or 20%) is conventional.
Power of a Test
• Power of a test is the probability of detecting a true result or difference (rejecting the null hypothesis of no difference when it is false), also 1-beta
• Beta error is the probability of accepting a false null hypothesis (e.g., saying there is no difference or relationship when there is one).
• For instance if the null hypothesis is Mean group A = Mean group B. If A really is different from B, beta error is likelihood of concluding there is no difference (accepting a false null hypothesis). Ideally this should be <0.20, so power is 1-beta, or at least 0.80.
Fallacies in Presenting Results: Statistically vs. Clinically Significant?
• Having a large sample size can virtually assure statistically significant results even if the correlation, odds ratio, or relative risk are low
• Conversely, an insufficient sample size can hide (not significant) clinically important differences (higher beta error or concluding no difference when there is one)
• Statistical significance directly related to sample size and magnitude of difference, and indirectly related to variance in measure
Variable Classification• What is your outcome (Y) (dependent variable) of interest?
– Categorical (binary, 3 or more categories) examples: survival, CHD incidence, achievement of BP control (yes vs. no)
– Continuous: change in blood pressure
• What is the main explanatory or independent variable (X) of interest?– Categorical (binary, 3 or more categories) examples:
treatment status (active vs. placebo), JNC-7 blood pressure category (normal, pre-HTN, Stage 1 HTN, Stage 2 HTN)
– Continuous: baseline systolic / diastolic blood pressure
Covariates / Confounders• The relationship between X and Y may be
partially or completely due to one or more covariates (C1, C2, C3, etc.) if these covariates are related to both X and Y
• A comparison of baseline treatment group differences in all possible known covariates is often done and presented
• Covariates / confounders normally equalized between groups only in randomized clinical trial designs
Analyzing Effects of Confounders
• The effect of confounders can be assessed by:– Stratifying your analysis by levels of these
variables (e.g., examine relationship of X and Y separately among levels of covariates C)
– Adjusting for covariates in a multivariable analysis
– Considering interaction terms to test whether effect of one factor (e.g., treatment) on outcome varies by level of another factor (e.g., gender)
Fallacies in Presenting Results: Statistically vs. Clinically Significant?
• Having a large sample size can virtually assure statistically significant results, but often with a very low effect size or relative risk
• Conversely, an insufficient sample size can hide (not significant) clinically important differences where the effect size or relative risk may be large.
• Statistical significance is directly related to sample size and magnitude of effect or difference, and indirectly related to variance in measure.
Assessing Accuracy of a Test
DISEASED / YES
NONDISEASED / NO
TOTAL
POSITIVE / reject null
a b a+b
NEGATIVE / accept null
c d c+d
TOTAL a+c b+d a+b+c+d
TRUE DISEASE STATUS / TREATMENT DIFFERENCE
TEST RESULT
SENSITIVITY = a / (a+c) SPECIFICITY = d / (b+d)
Pos. Pred. Value = a / (a+b) Neg. Pred. Value = d/(c+d)
False positive error (alpha, Type I) = b / (b+d)
False negative error (beta, Type II) = c/ (a+c)