Genome Exploration in A-T G-C space (mk1)
-
Upload
jonathan-blakes -
Category
Education
-
view
334 -
download
0
description
Transcript of Genome Exploration in A-T G-C space (mk1)
Genome exploration in A-T G-C space
introducing Icarusa DNA walking program
Jonathan Blakes
MSc Biotechnology and Computation
Department of BiosciencesFaculty of Science, Technology and Medical Studies
Problem
too much information!
EnsEMBL
UCSC
Genome Browsers
Hypothesis
Can DNA sequences be plotted in such a way that long sequences can be easily interpreted by humans without a priori knowledge?
“It seems that the simplest method of visualizing some properties of genomes is
to send a virtual walker for a genomic walk, ask "it" to talk about what it has
seen and note its observations. If our walker doesn't move with a Brownian-like
motion, it is possible to extract from its walk a lot of information.”
Stanislaw Cebrat, the principal Polish proponent of DNA walks
Assigning a cardinal coordinate (north, south, east or west) to each of the four nucleotide bases (A, T, G, C) and taking steps in
those directions as a sequence is read sequentially will produce a ‘walk’ of the sequence in which repetitive DNA elements will be
seen as repetitive 2-dimensional ‘structures’.
DNA walks are plots of DNA or RNA sequences where each of the four nucleotide bases
is assigned a direction and distance, the sequence is read off one nucleotide at a time
and for each nucleotide the virtual walker takes a step in the designated direction
creating a 'walk' of the sequence that reveals elements of structure in the nucleotide
composition.
DNA walking
From Comparative Genometrics website, L'Université de Lausanne
IcarusLive Demonstration
Could someone please suggesta mammalian gene
to walk?
Mapping
24 possible combinations of cardinal vectors:4 rotations for each of the 3 above mappings, and 4 rotations of each of their reflections about the x or y plane.
Choosing which 3 ‘unique’ mappings of those 24 is a matter of parsimony.
A-T G-C
A-G C-T
A-C G-T
A-T G-C
A-T G-C is consistently smallest
Smaller pictures can contain more information in less space and are therefore more amenable to publication, hence Genome Exploration in A-T G-C space
Duplicationsexons introns
a 7 fold contiguous duplication in the male
Y chromosome. Members of the TSPY
(Testis-specific Y-encoded proteins) family
identified by Skaletsky et al1 using a
combination of a whole chromosome
dotplot with a 2-kb window and a custom
Perl script running BLAST alignments of all
5-kb sequence segments, in 2-kb steps, of
the entire MSY (Male Specific Y).
In contrast I stumbled upon this purely by
accident.
1. Skaletsky et al. Nature 2003 423.
DNA walks for phylogenetics
But for a DNA walk the spatial distance between the first and last bases is a function of:• the nucleotide composition of a sequence and a 2D mapping• the order of the bases since the A might oppose T and C oppose G.
Imagine a 1-dimensional textual DNA sequence. The distance from the first base to the last is simply the number of bases in the sequence.
A comparison of aligned sequences on the basis of spatial distance(a much simpler measure than the Jukes-Cantor definition of evolutionary distance) will be
unable to discriminate between them.
7 previously aligned 1798-nucleotide long small ribosomal subunit sequences of Candida and Saccharomyces species as detailed in Gilfillan1 were walked and their total euclidean distances used to produce a phylogeny, which was compared to Gilfillan’s.
1. Gilfillan GD, et. al. Microbiology. 1998. 144: 829-838.
Phylogeny algorithms
neighbour joining Icarus’ UPGMA
Distance Matrix
Phylogeny Demonstration
Newick format
Distance Matrix
Output
Newick format string representation of a tree:
(Bovine:0.69395, (Gibbon:0.36079, (Orang:0.33636, (Gorilla:0.17147, (Chimp:0.19268, Human:0.11927) :0.08386):0.06124):0.15057):0.54939, Mouse:1.21460);
Phylogenies with DNA walks
Does summing distances from 3 mappings eliminate bias and produce a better
phylogeny?
NO.
A better distance measure is needed.
Conclusion
• Icarus is a DNA walk based genome browser that can retrieve sequences and annotate walks using Ensembl.
• DNA walks can demonstrate the existence of duplications in DNA to the untrained eye.
• Spatial distances measures can produce phylogenies, but a better measure is needed than Manhattan or Euclidean distance.
Acknowledgements
I would like to thank:
Dr. Gary RobinsonDr. Colin JohnsonDr. Anthony Baines
And everyone I have met during the Biotechnology and Computation MSc.