DNA & RNA Replication & Transcription Central Dogma: DNA—RNA--Protein.
Gene Finding. Biological Background The Central Dogma Transcription RNA Translation Protein DNA.
-
date post
21-Dec-2015 -
Category
Documents
-
view
228 -
download
0
Transcript of Gene Finding. Biological Background The Central Dogma Transcription RNA Translation Protein DNA.
Background
*Essential Cell Biology; p.268
Non-coding regions gene regulation Vicinity of TSS: direct interactions with Pol-II complex Larger vicinity – indirect interactions (chromatin remodelling)
Frame Shifts
Code Triplets (“codons”) are not overlapping 3x2 possible ways of reading depending on strand
and the relative position where reading starts This is not just our concern when looking for genes,
it is also the cell’s concern in terms of mutations: Original: THE FAT CAT ATE THE BIG RAT Delete C:THE FAT ATA TET HEB IGR AT
Prokaryotes Gene Finding
No noclues Most DNA is coding (e.g. 70% in H.influenza) Each gene is one contiunes DNA sequence (no
introns) PolyI – rRNA, PolyII – mRNA, PolyIII - tRNA
Detecting ORFSimple Idea:
If there is no gene encoded then the expected frequency of STOP codon is 3/64 codonsORF – open reading frame, a sequence of codons with no STOP codon
Simple Algorithm:1. scan until you find a stop condon, in all reading
frames. 2. Scan back to find a start codon. 3. If it’s long ehough, report this ORF as a putative
geneCons:Can’t detect short genesHigh FP ( E.Coli has 6500 ORFS but only 1100 genes)
Coding vs. Non coding regions Codon frequencies
Codon usage in coding regions is different Leucine, Alanine, Tryptophan are coded in 6:4:1
different codons Expect to see a ratio of 6:4:1 in random sequence In proteins the appear in 6.9:6.5:1 ratio Another example:
A or T appear in 90% of the case as the last letter of a codon in protein coding regions
Using Promoter’s Signal
We are still far from perfect… idea: try to detect signals in the promoter regions,
to help descriminate real genes in ORFs Prokaryotes:
~-35 tss: TTGACA~-10 tss: TATAAT (“TATA box” signal)
No single promoter has the exact consensus Nearly all promoters have 2-3 from TAxyzT 80-90% have all 3 In 50% xyz = TAA
Up To here summary
We have seen the problems in trying to find genes in wide genome scan – Prokaryotes!
The bottom line is that the problem is not really solved, but most research in gene finding focus on Eukaryotes, where the main interest lies …
Next lecture – much more sophisticated models, to handle the much more complex situation in Eukaryotes in general, and Human in particular