Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of...

15
Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center

Transcript of Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of...

Page 1: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Rhesus genome annotations

Rob NorgrenDepartment of Genetics, Cell Biology and AnatomyUniversity of Nebraska Medical Center

Page 2: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Conventional Approach to GeneChip Production

• Sequence millions of ESTs

• Obtain finished genomic sequences

• Cluster redundant ESTs

• Align EST clusters with genomic sequences

• Extract the last 571 bp of sequence from each transcript - probe selection region (PSR)

• Choose 11 to 16 probes that tile across the PSR

Page 3: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Problems with the conventional approaches for a rhesus macaque GeneChip

• Insufficient ESTs to cover most genes

• Little finished genomic sequence (in 2005)

Page 4: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Strategy for targeted amplification of rhesus genes

• Identify the terminal exon and flanking sequence for every human gene

• Design primers and amplify from monkey genomic DNA

• Obtain the rhesus PSR sequences

Terminal exon

PSRF R

Poly A

PSR: Probe selection regionF: forward primerR: reverse primer

Page 5: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Other sources for rhesus GeneChip PSRs

• Preliminary Baylor Genomic SequencesIn silico approach - Aligned human PSRs with preliminary rhesus genomic sequence.

• ESTs

Page 6: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Rhesus GeneChip

• Available in March 2005

• Novel design

• Whole genome expression array - 52,024 probes for 47,000 transcripts

• Probesets include 17,093 well-annotated genes (16 probes/probeset)

• Probesets were designed for 1,099 well-annotated genes not present on the U133+2.0 human GeneChip.

Page 7: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Rhesus Genome

• Draft published in Science on April 17, 2007

• “The rhesus macaque genome assembly is a draft DNA sequence, and it contains many gaps.”

Page 8: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

What does a “draft” rhesus genome mean?

• 26,907 protein coding genes for the human

• 24,038 protein coding genes for rhesus macaques

• Sounds good, but is misleading.

• 19,450 well-annotated protein coding genes for humans

• 8,744 well-annotated protein coding genes for rhesus macaques

• What does “well annotated” mean”?

• No “hypothetical” genes

• Only genes with “good” gene symbols. No “Locs”.

Page 9: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Problems with GeneChip annotations

• Affymetrix relies on NCBI annotations, hence, many probesets are not annotated with “real” gene symbols

• Stop gap solution:http://www.unmc.edu/rheusgenechip

• Permanent solution requires full and complete annotation of the rhesus genome at NCBI.

Page 10: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

What can go wrong at the genome sequencing center?

• Large gaps

• Small gaps

• Misassemblies

• Sequencing errors

Page 11: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

What can go wrong with ab initio annotations?

• Incorrect assignment of pseudogene status

• Failure to identify genes

• Incorrect gene models (some exons right, some wrong)

• Incomplete gene models

Page 12: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Consequences of non-annotated genes

• Large number of databases depend on NCBI annotations for their annotations. Example: Affymetrix GeneChips

• Errors and omissions are propagated to dependent databases

• Users are frustrated when they see “Locs” instead of a proper gene symbol

• Users can Blast each probeset consensus sequence or ask their bioinformatics personnel to establish gene identity, but this is wasteful in time and energy.

Page 13: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

How to correct annotations

• Annotations must be acceptable to NCBI, if they are not, corrections will not propagate to dependent databases.

• Some gene annotations can be corrected by manual inspection.

• Some gene annotations can be corrected by human ortholog-based gene models rather than ab initio approaches.

• Some gene annotations can only be corrected by additional sequencing.

• And some gene annotations require a trip to Hell...

Page 14: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Defensins - the gene family from Hell

• Large family of genes

• Orthologs poorly conserved - positive selection?

• Will require focused sequencing and annotation

• May require publication before NCBI annotates most of the rhesus defensins

Page 15: Rhesus genome annotations Rob Norgren Department of Genetics, Cell Biology and Anatomy University of Nebraska Medical Center.

Acknowledgements

• Jeff Kittrell

• Joel Goodsell

• Audrey Gomel

• NCRR/NIH