Review of main points from last week Medical costs escalating largely due to new technology This is...
-
Upload
gordon-chapman -
Category
Documents
-
view
216 -
download
1
Transcript of Review of main points from last week Medical costs escalating largely due to new technology This is...
Review of main points from last week
Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq.
Many new technologies provide only marginal benefits cost-effectiveness frequently not evaluated
FDA – is it “safe and effective”CMS – is it “necessary and reasonable”
Consider genetic testing and “personalized” medicine as example of a new technology needing evaluation
What benefits does/will it provide?At what cost? Are there potential harms?What ethical/legal/social issues does it raise?
Some concepts needed to understand this technology
What does DNA “do”? (genes, proteins)What is its structure? What is DNA sequence?What is a SNP?How is DNA passed from parents to offspring?What are mutations, genetic variants?How can they be associated with traits, diseases,
disease risks, sensitivity to particular drugs?Examples of tests offered “direct-to-consumer”
Today’s subject: gene-disease risk associations & GWAS
Understand how GWAS studies have been done in order to better evaluate disease risk predictions from companies like 23andMe
Understand strengths and limitations of GWAS
Go over some basic ideas in statistics needed to evaluate GWAS (and other apps. in engineering!)
Think about how technical complexity affects your ability to evaluate utility of this (and by example, other) new technologies
Genome-wide association studies (GWAS) = source of data for SNP associations with particular diseases
Basic idea – search for chr. regions (SNPs) with diff. allele frequencies in cases vs controls
If such SNPs found, it could be that: the SNP allele causes (or contributes to) the disease the SNP allele is close enough on a chr to disease-causing
mutation that they have been inherited together inmost people since mut’n. arose (founder effect)
the SNP allele and the disease both occur at higherfreq. in some ethnic group but not for genetic reasonse.g. malaria and skin pigment variants
Last possibility would be false positive result for GWAS!
So GWAS studies first go to great lengths to select genetically homogeneous cases and controls and exclude genetically heterogeneous individuals
How can you do this?
Use multi-dimensional scaling – a data visualization tool to group similar objects in complex data sets
Idea – imagine n-dim. space where each axis represents a SNP locus, and AA=0, Aa=.5, aa=1 along axis represent each individual as point in this space genetic dist. btw. people = Euclidian dist. btw. their pts.
Hard to “see” data in n-dimensional space when n is large So make 2-d plot of individuals so that dist. betw.pairs in 2-d “best” reflects Euclidean dist. in n-dimen.(imagine moving each point in 2-d map randomly to minimize discrepancy btw. 2-d and n-d distances, summed over all other individuals, then repeating for each individualuntil map positions converge)
Genetically closelyrelated people clusterin such a map Eliminate all outliersfrom GWAS studypopulation
Implication
Any positive GWAS findings are initially only “true” for a particular homogeneous group (e.g. CEU = N. Europeans) and must be retested in other populations before they can be accepted generally
Next problem – if genotype (pattern of alleles at some locus, e.g. AA vs Aa vs aa) frequencies differ betw. cases and controls, how much do they have to differ to be statistically significant?
Basic idea in statistics – see if data are reasonably likely given “null” hypothesis (H0) that groups (e.g., cases and controls) do not differ (in genotype frequencies)
If groups are not really different, you could pool the data and calculate mean and st. dev. for the pool, then ask if you randomly chose 2 groups (of the size of the cases and controls) from this one population, how likely would the means of the 2 groups differ by as much as you observe. A “t” test gives you this probability. If it is very low, you may have reason to reject the null hypothesis.
The chi sq test is very like the “t” test Chi sq = S(Exp-Obs)2/Exp It’s probability distribution is known for randomly selected groups from a single population. If p(chi sq) < small # a, e.g. a =.05, you might want to conclude the groups are different
Traditionally, and completely arbitrarily, a = 0.05 is often taken as a cut-off. This means that if the groups are really not different, you’ll make a mistake and call them different 5% of the time. You pick the cut-off for whatever error rate you feel appropriate
Complication – if one tests for association with 20 (or n), independent things, expect ~1 to have p(chi sq) <.05 (< 1/n) even when no assoc. exists (false positive, FP).
Testing for assoc with any of ~106 genes, one needs much stricter criterion than a=.05 in order to avoid lots of FP’s
Simplest correction – Bonferroni: divide a by n = # of SNPs tested; e.g. require p(chi sq.) < 0.05/106 ~10-8
in order that probability of any FP be < .05
Example chi sq calculation
hypothetical #'s with each genotype
aa aA AA sum dis. cases 45 510 1445 2000 controls 120 960 1920 3000
totals 165 1470 3365 5000
If H0 true, can pool groups for best est. of probabilities p(aa) = 165/5000; p(aA)=1470/5000, p(AA)=3365/5000
Then expected # aa among dis. cases = p(aa)*2000 = 66 Expected # of aA among dis. cases = p(aA)*2000 = 588 Compute remaining expected #’s same way or from totals -> Expected # aa aA AA sum dis. cases 66 588 1346 2000 Controls 99 882 2019 3000 totals 165 1470 3365 5000
Chi sq = S(exp-obs)2/exp = (66-45)2/66 + … = 40.52 p(chi sq, 2df) = 1.59x10-9 (from table, or web) < a = 10-8
so H0 (no association) rejected, assoc. is likely
For confirmation, repeat study in independent groups
Next problem, not really interested in p(data|hypothesis) want p(hypothesis|data)
E.g., you observe freq of some SNP alllele is higher in disease group vs controls, you want to know p(dis.|genotype) not p(genotype|disease)
Bayesian statistics allows you to infer p(disease|genotype) from p(genotype|disease)
Basic Idea: 2 ways to calculate p of disease and genotype AAp(D|AA)p(AA) = p(AA|D)p(D)-> p(D|AA) = p(AA|D) p(D) / p(AA)have to know p(D), p(AA), and p(AA|D) to get p(D|AA)
Relative risk might be measured as p(D|AA)/p(D) but frequently expressed in terms of “odds ratio”
Odds = p(event)/[1-p(event)] e.g. “2:1” if p(event)=.67
Odds ratio = odds(D|AA)/odds(D) (assume A is hi risk allele)= {p(D|AA)/[1-p(D/AA)]} / {p(D)/[1-p(D)]}
note odds ratio > relative risk since [1-p(D)]/[1-p(D/AA)] >1
Look at data in GWAS paper, Nature 447:661 (2007)
appreciate the magnitude, expense, complexity – and limitations
~100 authors, 106 SNPs testedin each of 17,000 samples (@ $1000)
could study have been done if each test cost $1?
Note most disease risks measured by OR only ~1.2-2
Do you understand most of the columns in this table?raw
p(chi sq) ORsdis
Note many SNPs in region are associated with disease
Example of hit region
Summary of all hits, all diseases, all chromosomes
Ln10(p)
Limitations
Do most SNP associations identify causative mutations? No, because there are many SNPs in each region – they can’t all be causative
If not causative, why the association? Likely explanation – causative mutation arose sometime, not very long ago, on some chromosome in “founder” individual; he/she passed on the mutation to offspring along with adjacent chromosomal regions. Recombination between causative mutation and these regions has not yet occurred on most chromosomes bearing mutation, so SNPs near mut’n in founder remain associated in offspring = linkage disequilibrium (LD)
Implications – associated SNPs reflect fairly recent mutations, therefore may be restricted to particular ethnic groups (not enough time to spread throughout the world by migration, interbreeding); in other groups the same SNPs may be unassociated with disease; hard to find very old mutations causing disease (no LD)
hits provide locational clues to causative mutations; the latter could provide leads for new rx, reduce imprecision in risk assessments
most SNP associations now confirmed in independent disease group studies (see 23andMe white paper on “vetting” disease associations)
Note most relative risks (odds ratios) are small, < 2-fold
Does this make most results practically insignificant? Odds ratios are much smaller than expected from estimates of heritability from family studies
Example: height said to be 80% inherited but max combined effect of all associated SNPs only ~5%
How is “% heritability” estimated?
Old way: for height, plot children’s height vs mean height of their parents; if children with tall parents tendto be tall, height could be genetic
Fraction of variance explained by parents height= 1 - S[hc-(ahp+b)]2 / S(hc-<hc>)2
hchildren
<hparents>
find (least sq.) best fit line: ahp+b
comp. variance from best fit line to variance from global mean line
<hc>
More mathematically,
Does child-parent height correl. prove height is genetic? No – it may confound environment and gene effects (tall parents may eat better and provide better diet)
Clever way to tease out genetic from environmental effects within families: use SNP genotypes to measure genetic relatedness between siblings and plot height differences betw. sib. pairs vs. genetic relatedness
Genetic relatedness = % genes that are identical in siblings due to inheritance from the same grandparent (e.g. they both get their mothers maternal (or paternal) alleles vs one gets the maternal and the other the paternal allele); call this % identity by descent, IBD)
Plot height difference between sibs vs % IBD
hdiff
55 50 45 (% IBD)
Now variance from red line / variance from blue lineprovides estimate of effect explained by genes,controlled for environment (sib pairs expectedto share environments to same extent, unaffected by their % IBD)
Can generalize to disease incidence …(don’t worry about details)
Find least sq. best fit line:
0 .5 1hh Hh HH genotype
disease 1
no disease 0 Fractionexplained by H
= 1 – (var from red line/ var from blue)
to say what fx of disease incidence is “explained” by SNPs
“Missing heritability” = big embarrassment for GWASPossible explanations:
family studies overestimate heritability by confounding environmental effects
disease caused by changes in gene expression not detectable by SNPs (epistasis)
disease caused by very old mutation (assoc. lost due to genetic recombination over time)
disease caused by rare alleles (SNPs analyzed chosen to have minor allele frequencies > 5%)
Some want to push on w/ GWAS, testing rarer SNPs or sequencing genomes to look for alleles assoc. w/diseaseReward may be in understanding how particular genes
contribute to diseases, not in utility of risk prediction
Next problem: how to combine risks from unlinked SNPs 23andMe multiples relative risks (see 23andMe white paper). This assumes effects are independent, i.e. no gene interactions. Is this accurate?
Counter example: gene that raises expression of fetal hgb decreases severity of sickle cell disease => some genes interact “non-linearly”
How could one verify if predicted dis. risks are accurate?
1. Prospective studies – think about feasibility: how many subjects needed, how much time, etc.
2. Compare different companies’ risk predictions
Venter et al compared risk predictions of Navigenics and 23andMe for 13 diseases for 5 individuals
Results: Qualitative discrepancies for half of people in half of tests
Explanation: Companies used different sets of SNPs. Does this restore confidence in clinical validity of test?
Smallness of effects limit clinical utility - most effects comparable to risks conferred by positive family history
But for some diseases, where dis. mutation identified, predicted risk increase can be large
e.g. CF ~100%, though severity can varyBRCA1 – some mutations elevate life-time risk
from ~8% to 80% (> 20x risk for early onset)
Next problem – when relative risk inc. is large, is theresomething one can do about it?
Will come back to this for BRCA in unit on screeningfor breast cancer
At what point, if any, should FDA regulation be required?
Should it depend on magnitude of relative risk,absolute risk?
On whether test results are likely to result in life-altering action?
surgery drug treatment life-long screening abortion
Different measures of test validity and utility:
Scientific validity – does it detect the SNPs it says it does, with what error rate?
Clinical validity – does it produce valid diagnoses?
Clinical utility – is the information useful in a medicalsetting?
How do tests for BRCA mutation, CF carrier status,warfarin sensitivity rate by these criteria?
Main points
basic idea of GWAS – diff allele freq. in dis. vs cont. grps many DNA regions found to affect chance of getting
several common diseases
most effects small, possibly limited to spec. ethnic groups
provide leads for finding causative genes -> understanding disease mechanism
possible applications in drug therapy (“personalized medicine”)
Homework: look over GWAS paper, try to get big ideas, don’t
worry about unintelligible jargon
divide and conquer papers/topics (pick one):
Math Exercise on odds ratios and chi sq (2 items) Venter on comparing 23andMe and Navigenics results
what are ethics of his conclusions? NYT - on behavorial effect of DTC genetic testing NEJM – on risk prediction from GWAS (2 items) 2 views of utility of warfarin genetic test (pick one)
Am Coll Cardiol. - it reduces hosp. Ann Int. Med. – it is not cost-eff.