Understanding mechanisms underlying human gene expression variation with RNA sequencing
-
Upload
joseph-pickrell -
Category
Technology
-
view
3.769 -
download
2
description
Transcript of Understanding mechanisms underlying human gene expression variation with RNA sequencing
Goals:• Long-term: understand the precise mechanisms by
which genetic variation in humans influences gene expression
• inform genome-wide association studies (which have identified hundreds of non-coding region associated with disease)
• This study: identify the transcribed and polyadenylated RNAs present in a model cell type, identify genetic variants that influence the expression of these RNAs
Lymphoblastoid cell lines: a model system for understanding the genetics
of gene regulation
•This talk: 69 cell lines derived from white blood cells from Nigerian individuals
Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)
from the HapMap Project
Genomic Data: mRNA expression, DNA methylation, histone marks,
etc.
Lymphoblastoid cell lines: a model system for understanding the genetics
of gene regulation
Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)
from the HapMap Project
This talk: 69 cell lines derived from white blood cells from Nigerian individuals
AAAAAAAAAA
DNA
RNA
RNA-Seq 1: Isolate poly-A RNA, convert to cDNA
AAAAAAAAAA
DNA
RNA
RNA-Seq 2: Generate short sequencing reads from cDNA library (using
Illumina GA2)
AAAAAAAAAA
DNA
RNA
RNA-Seq 3: Map short reads back to genome;
count of reads/gene measures expression
AAAAAAAAAA
DNA
RNA
AAAAAAAAAA
Lymphoblastoid cell lines: a model system for understanding the genetics
of gene regulation
•This talk: 69 cell lines derived from white blood cells from Nigerian individuals
AAAAAAAAAA
Sequence RNA from each line (total 1.2 billion sequencing reads), average expression
levels across lines
RNA-Seq identifies new exons
AAAAAAAAAA
DNA
RNA
AAAAAAAAAA
DNA
RNA
RNA-Seq identifies new exons
RNA-Seq identifies new polyadenylation sites
AAAAAAAAAA
DNA
RNA
AAAAAAAAAA
DNA
RNA
RNA-Seq identifies new polyadenylation sites
Revisiting gene annotations in these cells
• ~4,000 unannotated, conserved exons
• 115 of which appear protein-coding
• Unannotated exons have lower expression and are more tissue specific than annotated exons
• ~400 polyadenylation sites over 50 bases from a known site.
• Conclusion: extensive use of unannotated UTRs.
Lymphoblastoid cell lines: a model system for understanding the genetics
of gene regulation
•This talk: 69 cell lines derived from white blood cells from Nigerian individuals
AAAAAAAAAA
Genotypes: > 4M Single Nucleotide Polymorphisms
(SNPs) from the HapMap Project
Genotypes: > 4M Single Nucleotide Polymorphisms
(SNPs) from the HapMap Project
Genotypes: > 4M Single Nucleotide Polymorphisms
(SNPs) from the HapMap Project
Genotypes: > 4M Single Nucleotide Polymorphisms
(SNPs) from the HapMap Project
Genotypes: > 4M Single Nucleotide Polymorphisms
(SNPs) from the HapMap Project
Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)
from the HapMap Project
Expression: from RNA-Seq
(Natural) Genetic variation potentially affects many levels of gene regulation
DNA
1. Transcription InitiationChromatin accessibilityTF Binding
2. mRNA processingSplicingPolyadenylationCapping, export
3. mRNA degradationmicroRNA regulationNMD
4. Translation, etctRNA abundancesProtein localizationProtein degradation
AAAAAAAAAA
DNA
RNA
RNA-Seq 3: Map short reads back to genome;
count of reads/gene measures expression
C
AAAAAAAAAA
DNA
RNA
AAAAAAAAAA
T
•use genotypes to identify associations between genetic variation and expression (eQTLs)
•~1000 eQTLs at an FDR of 10%
Polymorphisms near the transcription start site of a gene are the most likely to affect its
transcription
• Combining information across all genes, we can ask where SNPs that affect expression lie
• SNPs near the TSS, throughout the genic region most likely to influence expression
Black: bins within the genic regionBlue: bins outside the genic region
L
See also Veyrieras et al. (2008), Stranger et al. (2007), Cheung et al. (2005)
(Natural) Genetic variation potentially affects many levels of gene regulation
DNA
1. Transcription InitiationChromatin accessibilityTF Binding
2. mRNA processingSplicingPolyadenylationCapping, export
3. mRNA degradationmicroRNA regulationNMD
4. Translation, etctRNA abundancesProtein localizationProtein degradation
• use genotypes to identify associations between genetic variation and splicing (sQTLs)
• ~200 sQTLs at an FDR of 10%
• use genotypes to identify associations between genetic variation and splicing (sQTLs)
• ~200 sQTLs at an FDR of 10%
where are SNPs that affect splicing?
• Figure: odds of a SNP in a given functional annotation to impact splicing (relative to those in non-splice site intronic positions)
• SNPs in splice sites (this is defined liberally to include sites beyond the canonical two bases) and within the exon itself are enriched for sQTLs
Conclusions•Goal: understand the mechanisms of natural variation in gene
regulation in a model system
•RNA sequencing is useful for annotating genomes and comparing mRNA levels across individuals
•Observe extensive usage of unannotated UTRs.
•eQTLs enriched near transcription start sites, sQTLs enriched in and around canonical splice sites
•Next steps: can we identify which transcription factors/splice factors have altered binding, leading to variation in expression?
http://dx.doi.org/10.1038/nature08872
http://eqtl.uchicago.edu