Understanding mechanisms underlying human gene expression variation with RNA sequencing

30

description

 

Transcript of Understanding mechanisms underlying human gene expression variation with RNA sequencing

Page 1: Understanding mechanisms underlying human gene expression variation with RNA sequencing
Page 2: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Goals:• Long-term: understand the precise mechanisms by

which genetic variation in humans influences gene expression

• inform genome-wide association studies (which have identified hundreds of non-coding region associated with disease)

• This study: identify the transcribed and polyadenylated RNAs present in a model cell type, identify genetic variants that influence the expression of these RNAs

Page 3: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

•This talk: 69 cell lines derived from white blood cells from Nigerian individuals

Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)

from the HapMap Project

Genomic Data: mRNA expression, DNA methylation, histone marks,

etc.

Page 4: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)

from the HapMap Project

This talk: 69 cell lines derived from white blood cells from Nigerian individuals

Page 5: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

RNA-Seq 1: Isolate poly-A RNA, convert to cDNA

Page 6: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

RNA-Seq 2: Generate short sequencing reads from cDNA library (using

Illumina GA2)

Page 7: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

RNA-Seq 3: Map short reads back to genome;

count of reads/gene measures expression

Page 8: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

AAAAAAAAAA

Page 9: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

•This talk: 69 cell lines derived from white blood cells from Nigerian individuals

AAAAAAAAAA

Sequence RNA from each line (total 1.2 billion sequencing reads), average expression

levels across lines

Page 10: Understanding mechanisms underlying human gene expression variation with RNA sequencing

RNA-Seq identifies new exons

Page 11: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

Page 12: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

Page 13: Understanding mechanisms underlying human gene expression variation with RNA sequencing

RNA-Seq identifies new exons

Page 14: Understanding mechanisms underlying human gene expression variation with RNA sequencing

RNA-Seq identifies new polyadenylation sites

Page 15: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

Page 16: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

Page 17: Understanding mechanisms underlying human gene expression variation with RNA sequencing

RNA-Seq identifies new polyadenylation sites

Page 18: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Revisiting gene annotations in these cells

• ~4,000 unannotated, conserved exons

• 115 of which appear protein-coding

• Unannotated exons have lower expression and are more tissue specific than annotated exons

• ~400 polyadenylation sites over 50 bases from a known site.

• Conclusion: extensive use of unannotated UTRs.

Page 19: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Lymphoblastoid cell lines: a model system for understanding the genetics

of gene regulation

•This talk: 69 cell lines derived from white blood cells from Nigerian individuals

AAAAAAAAAA

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms

(SNPs) from the HapMap Project

Genotypes: > 4M Single Nucleotide Polymorphisms (SNPs)

from the HapMap Project

Expression: from RNA-Seq

Page 20: Understanding mechanisms underlying human gene expression variation with RNA sequencing

(Natural) Genetic variation potentially affects many levels of gene regulation

DNA

1. Transcription InitiationChromatin accessibilityTF Binding

2. mRNA processingSplicingPolyadenylationCapping, export

3. mRNA degradationmicroRNA regulationNMD

4. Translation, etctRNA abundancesProtein localizationProtein degradation

Page 21: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

RNA-Seq 3: Map short reads back to genome;

count of reads/gene measures expression

C

Page 22: Understanding mechanisms underlying human gene expression variation with RNA sequencing

AAAAAAAAAA

DNA

RNA

AAAAAAAAAA

T

Page 23: Understanding mechanisms underlying human gene expression variation with RNA sequencing

•use genotypes to identify associations between genetic variation and expression (eQTLs)

•~1000 eQTLs at an FDR of 10%

Page 24: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Polymorphisms near the transcription start site of a gene are the most likely to affect its

transcription

• Combining information across all genes, we can ask where SNPs that affect expression lie

• SNPs near the TSS, throughout the genic region most likely to influence expression

Black: bins within the genic regionBlue: bins outside the genic region

L

See also Veyrieras et al. (2008), Stranger et al. (2007), Cheung et al. (2005)

Page 25: Understanding mechanisms underlying human gene expression variation with RNA sequencing

(Natural) Genetic variation potentially affects many levels of gene regulation

DNA

1. Transcription InitiationChromatin accessibilityTF Binding

2. mRNA processingSplicingPolyadenylationCapping, export

3. mRNA degradationmicroRNA regulationNMD

4. Translation, etctRNA abundancesProtein localizationProtein degradation

Page 26: Understanding mechanisms underlying human gene expression variation with RNA sequencing

• use genotypes to identify associations between genetic variation and splicing (sQTLs)

• ~200 sQTLs at an FDR of 10%

Page 27: Understanding mechanisms underlying human gene expression variation with RNA sequencing

• use genotypes to identify associations between genetic variation and splicing (sQTLs)

• ~200 sQTLs at an FDR of 10%

Page 28: Understanding mechanisms underlying human gene expression variation with RNA sequencing

where are SNPs that affect splicing?

• Figure: odds of a SNP in a given functional annotation to impact splicing (relative to those in non-splice site intronic positions)

• SNPs in splice sites (this is defined liberally to include sites beyond the canonical two bases) and within the exon itself are enriched for sQTLs

Page 29: Understanding mechanisms underlying human gene expression variation with RNA sequencing

Conclusions•Goal: understand the mechanisms of natural variation in gene

regulation in a model system

•RNA sequencing is useful for annotating genomes and comparing mRNA levels across individuals

•Observe extensive usage of unannotated UTRs.

•eQTLs enriched near transcription start sites, sQTLs enriched in and around canonical splice sites

•Next steps: can we identify which transcription factors/splice factors have altered binding, leading to variation in expression?

Page 30: Understanding mechanisms underlying human gene expression variation with RNA sequencing

http://dx.doi.org/10.1038/nature08872

http://eqtl.uchicago.edu