BI420 – Course information Web site: marth/BI420marth/BI420 Instructor: Gabor Marth Teaching.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College...
-
Upload
edwin-conley -
Category
Documents
-
view
217 -
download
0
Transcript of Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College...
Computational
research for medical
discovery at Boston
College Biology
Gabor T. Marth Boston CollegeDepartment of [email protected]://clavius.bc.edu/marthlab
We study genetic variations because…
… they underlie phenotypic differences
… cause heritable diseases and determine responses to drugs
… allow tracking ancestral human history
Our current projects investigate three essential aspects of genetic variations…
• how to discover inherited genetic polymorphisms that lead to disease?
• how to model human polymorphism structure to inform medical research?
• how to select the best genetic markers for clinical case-control association studies?
inherited (germ line) polymorphisms are important as they can predispose to disease
the most common type of human polymorphisms are single-nucleotide polymorphisms (SNPs) and short insertion-deletions (INDELs)
1.
1. We build computer tools for variation discovery…
Siablevarall
]T,G,C,A[S ]T,G,C,A[SiiiorPr
iiorPr
i
iiorPr
i
NiorPrNiorPr
NN
iorPr
i Ni
N
N
N )S,...,S(P)S(P
)R|S(P...
)S(P
)R|S(P...
)S,...,S(P)S(P)R|S(P
...)S(P)R|S(P
)SNP(P
1
1
1
1 11
11
11
we have developed a computer package, PolyBayes© , for accurate discovery of DNA polymorphisms in clonal sequences
Marth et al. Nature Genetics 1999
… we are currently expanding our polymorphism detection capabilities.
Homozygous T
Homozygous C
Heterozygous C/T
• for automated detection of somatic single base pair mutations in diploid samples
• to make the software available for genome centers with high-performance systems and small Biology labs with desktop computers
• to include our new knowledge of human variation structure into
the detection algorithms
2. We measure genome-wise distributions of DNA polymorphism data…
1. marker density (MD): distribution of number of SNPs in pairs of sequences
0
0.1
0.2
0.3
0 1 2 3 4 5 6 7 8 9 10
“rare” “common”
2. allele frequency spectrum (AFS): distribution of SNPs according to allele frequency in a set of samples
0
0.05
0.1
1 2 3 4 5 6 7 8 9 10
… we build models of these distributions under competing scenarios of human demographic history…
past
present
stationary expansioncollapse
MD(simulation)
AFS(direct form)
histo
ry
0
0.05
0.1
1 2 3 4 5 6 7 8 9 10
0
0.05
0.1
1 2 3 4 5 6 7 8 9 100
0.05
0.1
1 2 3 4 5 6 7 8 9 10
0
0.05
0.1
1 2 3 4 5 6 7 8 9 10
bottleneck
0
0.1
0.2
0.3
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0 1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0 1 2 3 4 5 6 7 8 9 10
… and determine the best-fitting models.
0
0.05
0.1
0.15
1 2 3 4 5 6 7 8 9 10
minor allele count
0
0.05
0.1
0.15
1 2 3 4 5 6 7 8 9 10
minor allele count
European data
African data
genetic bottleneck
modest but uninterrupted
expansion
0
0.05
0.1
0.15
1 2 3 4 5 6 7 8 9 10
minor allele count
Marth et al. PNAS 2003; Genetics 2004
3. The HapMap project aims to map out human polymorphism structure to aid gene mapping…
However, the variation structure observed in the reference DNA samples genotyped by the HapMap project…
… often does not match the structure in another set of samples such as clinical samples used to find disease genes and disease-causing genetic variants
… we generate “quasi-samples” with computational means to study sample-to-sample variability…
Instead of genotyping additional sets of (clinical) samples with costly experimentation, and comparing the variation structure of these consecutive sets directly…
… we generate additional samples with computational means, based on our Population Genetic models of demographic history, using the Coalescent process.
… and to optimize tag SNP (marker) selection for clinical association studies.
2. generate computational samples for this genome region
3. test the performance of markers across consecutive sets of computational samples
1. select markers (tag SNPs) with standard methods
We are developing projects to expand…
• from single-nucleotide DNA changes to developing computer tools for the detection of other types of genomic and epigenetic changes (e.g. in cancer)
• to developing visualization and statistical tools for the integration of diverse genetic and epigenetic data(Image from
Nature Reviews Genetics)
• to using the fruits of the HapMap project, dense SNPs, Linkage Disequilibrium, and haplotype markers to help predict individual responses to drugs, including adverse drug reactions
Detecting SNPs in medical re-sequencing data, short insertions / deletions
• detection in new data types produced by the latest, super-high throughput
sequencing technologies (i.e. 454 Life Sciences sequencing machines) that will be
used for individual medical re-sequencing
• reliable detection of INDELs and microsatellite polymorphisms, both in clonal and in diploid sequence data, e.g. to detect repeat instabilities
Using SNP array data intelligently to detect chromosomal aberrations
Speicher & Carter, NRG 2005
Software development for other genetic and epigenetic data (focus on data confidence)
copy number detection
chromatin structure
Sproul, NRG 2005
methylation profile
Laird, NRC 2005
Integrate genetic and epigenetic data from varied sources to find “common themes” during cancer development
chromosome rearrangement
s
chromatin structure
gene expression profile
copy number changes
methylation profile
repeat expansions
Using new haplotype resources to connect genotype and clinical outcome in pharmaco-genetic systems
• the HapMap was designed as a tool to detect high-frequency (common) phenotypic (e.g. disease-causing) alleles
• important drug metabolizing enzymes are relatively few in number, well studied, are at known genome locations, many associated phenotypes are well described
• many functional alleles are known, and of high frequency (common)
• multi-SNP alleles are highly predictive of metabolic phenotype
• clinical phenotype (adverse drug reaction) less predictable
• ideal candidate for applying haplotype resources
Multi-marker haplotypes as accurate markers for ADRs?
functional allele (known metabolic
polymorphism)
genetic marker (haplotype) in genome
regions of drug metabolizing enzyme
(DME) genes
molecular phenotype (drug concentration measured in blood
plasma)
clinical endpoint (adverse drug
reaction)computational prediction
based on haplotype structure
Resources
• specifics of enzyme-drug interactions
• LD and haplotype structure in the HapMap reference samples, based on high-density SNP map
• functional alleles
• existing DME P genotyping chips
Evolutionary / PopGen questions
• mutation age?
• mutations single-origin or recurrent?• geographic origin of mutations?
• analysis based on complete local variation structure and haplotype background of functional mutations
• specifics of the selection process that led to specific functional alleles?
Proposed steps of analysis
• haplotypes vs. metabolic phenotype?
• complete polymorphic structure?
• ethnicity?
• additional functional SNPs?
• haplotypes vs. functional alleles?
haplotype block?
functional allele(genotype)
metabolic phenotype
clinical phenotype(ADR)haplotype
• haplotypes vs. ADR phenotype?
Funding sources / plans
• polymorphism discovery + medical re-sequencing data analysis: 5-year NIH R01 research grant awarded
• pop-gen modeling + haplotype analysis + marker selection system: NIH R01 application pending
• informatics tools for genomic and epigenetic changes in cancer: need a postdoc to establish project (startup or NIH R21 or private funding)
• haplotypes in Pharmacogenomics: need a postdoc to establish project (startup or NIH R21 or private funding)