Do not reproduce without permission 1 (c) 2004 1 (c) Mark Gerstein, 2002, Yale,...

download Do not reproduce without permission 1   (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

If you can't read please download the document

description

Do not reproduce without permission 3 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes are among the most interesting intergenic elements Regulatory regions, repeats, non-coding RNA, origins of replication…. Formal Properties of Pseudogenes (  G)  Inheritable  Homologous to a functioning element  Non-functional* No selection pressure so free to accumulate mutations –Frameshifts & stops –Small Indels –Inserted repeats (LINE/Alu) What does this mean? no transcription, no translation?… [Mighell et al. FEBS Letts, 2000]

Transcript of Do not reproduce without permission 1 (c) 2004 1 (c) Mark Gerstein, 2002, Yale,...

Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2004, Feel free to use images in it with PROPER acknowledgement. Do not reproduce without permission 2 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of Transcription and Evolution Deyou Zheng, Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond, Siew Woh Choo, Yontao Lu, France Denoeud, Stylianos Antonarakis, Michael Snyder, Yijun Ruan, Chia-Lin Wei, Thomas Gingeras, Roderic Guigo, Jennifer Harrow, Mark Gerstein Yale, Sanger, UCSC, GIS, AFFX, U Geneva, IMIM a GT effort with great thanks to MSA, VAR, TR Talk at ENCODE 2006, ' in 20:30-21:30 Do not reproduce without permission 3 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes are among the most interesting intergenic elements Regulatory regions, repeats, non-coding RNA, origins of replication. Formal Properties of Pseudogenes ( G) Inheritable Homologous to a functioning element Non-functional* No selection pressure so free to accumulate mutations Frameshifts & stops Small Indels Inserted repeats (LINE/Alu) What does this mean? no transcription, no translation? [Mighell et al. FEBS Letts, 2000] Do not reproduce without permission 4 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes ( G) as Disabled Homologies Cyc gene A pseudogene Do not reproduce without permission 5 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes? Important for Doing Accurate Gene Annotation Abundant: > 8000 retropseudogenes in human High sequence similarity with genes 25% in C. elegans ? [Mounsay, Genome Research, 2002] Interfere with study on functional genes Cross-hybridation in micro-array and RT-PCR. [Ruud, Int. J. Cancer 1999] Some pseudogenes have regulatory roles G are genomic fossils Study the evolution of genes and genomes Measure mutation/insertion rates Do not reproduce without permission 6 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes? Cause errors in sequence databases > 8000 retropseudogenes in human Contamination in Ensembl 25% in C. elegans ? [Mounsay, Genome Research, 2002] "Interfere" with functional genes Cross-hybridation in microarray and PCR (Cytokeratin 19, Int. J. Cancer 1999) Very rarely this gives some pseudogenes regulatory roles G are genomic fossils Study the evolution of genes and genomes Measure mutation/insertion rates In mouse, a pseudogene up-regulates gene expression of Makorin1 by binding to a transcriptional repressor or an RNA- digesting enzyme [Hirotsune et al. Nature ] Do not reproduce without permission 7 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes? Cause errors in sequence databases > 8000 retropseudogenes in human Contamination in Ensembl 25% in C. elegans ? [Mounsay, Genome Research, 2002] Interfere with study on functional genes Cross-hybridation in micro-array and RT-PCR. [Ruud, Int. J. Cancer 1999] Some pseudogenes have regulatory roles G are genomic fossils Study the evolution of genes and genomes Illuminate important genomic remodeling processes of duplication and retrotransposition Measure mutation/insertion rates Do not reproduce without permission 8 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Duplicated Pseudogenes Original Gene Gene Duplication Mutations retains intron/exon structure e.g. globins, Hox cluster and Arabidopsis genome sometimes can be transcribed Do not reproduce without permission 9 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Retro-pseudogenes (Processed G) Original Gene LINE-11 mediated retrotransposition Mostly dead-on-arrival (DOA) Intronless, poly-A tail, direct repeats Target-primed reverse-transcription: -TT|AAA- AACATA AAAAAA Other types: Numt (nuclear mitochondria DNA) Do not reproduce without permission 10 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overlap of Pseudogenes by 5 Different Methods 4 automatic pipelines (comparing protein or transcript v genomic DNA, filtering, application of rules) + HAVANA manual GIS Do not reproduce without permission 11 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu HNRPA1 MTND2 MTND4 CYTB Ribonucleoprotein A1 proc. pseudogene Inserted mito. seq. resulting in 3 pseudogenes Complexities in Pseudogene Annotation Do not reproduce without permission 12 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Complexities in Pseudogene Annotation HNRPA1 MTND2 MTND4 CYTB Ribonucleoprotein A1 proc. pseudogene Inserted mito. seq. resulting in 3 pseudogenes Do not reproduce without permission 13 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Regional Distribution 201 pseudogenes 77 non-processed 124 processed OR Do not reproduce without permission 14 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ex. Pseudogene Intersecting Transcriptional Evidence TARS CAGE diTAG ChIP- chip Do not reproduce without permission 15 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Intersection of Pseudogenes with Transcriptional Evidence Do not reproduce without permission 16 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Targeted Transcription Expts. RACE expts Interrogated 160 pseudogenes (49 non-processed & 111 processed) In 51 cases (26 non-processed and 25 processed pseudogenes), could design distinguishing primers (>4 mismatched bp v. parent) The resulting data supported transcription from 14 (8 processed and 6 non-processed) of the 160 pseudogenes (9 with pseudogene specific primers) These numbers might represent a conservative estimate since a RACEfrag was assigned to its parent gene by default if it could be mapped to both a parent locus and a pseudogene locus. RACE expts + sequencing (CAGE, PET, EST and mRNA) unambiguous evidence for pseudogene transcription All together, these data indicate 38 of 201 pseudogenes being the source of novel RNA transcripts 5 of these had cryptic promotors (from TR analysis) Do not reproduce without permission 17 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu History of Pseudogene Preservation Absent Present with Disablement Present without Disablement Do not reproduce without permission 18 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Retrotransposition within Last 45 MYA Created Many Processed Pseudogenes Do not reproduce without permission 19 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Sequence Decay of Pseudogenes, Approximately Neutral Do not reproduce without permission 20 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Sequence Decay of Pseudogenes Relative to their Immediate Genomic Context Do not reproduce without permission 21 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Scaling Issues 201 pseudogenes X 100 = ~20K, which agrees with previous est. for whole genome Interplay between manual annotation and automatic pipelines Dynamic interplay with gene annotation (can't overlap) Need to have a protein alignment Do not reproduce without permission 22 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Using phastOdd value to examine neutral evolution of pseudogenes