Gene Prediction in Genomic Studies Ab-initio based methods
description
Transcript of Gene Prediction in Genomic Studies Ab-initio based methods
![Page 1: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/1.jpg)
Gene Prediction in Genomic StudiesAb-initio based methods
Angela Pena GonzalezLavanya Rishishwar
![Page 2: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/2.jpg)
INTRODUCTIONWhat Gene Prediction means and a brief background
![Page 3: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/3.jpg)
Introduction: Gene Prediction
• Gene Prediction is the process of detection of the location of open reading frames (ORFs) and delineation of the structures of introns as well as exons if the genes of interest are of eukaryotic origin.
• The ultimate goal is to describe all the genes computationally with near 100% accuracy
![Page 4: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/4.jpg)
Introduction: ORF
• Reading Frame: A sequence of DNA/RNA that is translated into an amino acid sequence, three bases at a time, each triplet sequence coding for a single amino acid
• Every region of DNA has six possible reading frames
• Open Reading Frame (ORF) is the longest frame uninterrupted by a stop codon
![Page 5: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/5.jpg)
Introduction: ORF
• Not all translations have a biochemical support for them, some are merely derived theoretically or computationally
• In other words, each gene is an ORF but not ever ORF is a gene
![Page 6: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/6.jpg)
Introduction: Gene
• Genes are the functional and physical unit of heredity passed from parent to offspring.
• Genes are pieces of DNA, and most genes contain the information for making a specific protein.
![Page 7: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/7.jpg)
Introduction: Gene Models
Prokaryotic Eukaryotic
![Page 8: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/8.jpg)
Introduction: Coding v/s Noncoding
Coding regionCoding regions are the parts of DNA which will give rise to a mature messenger RNA that will be translated into the specific amino acids of the protein product
Noncoding regionNoncoding regions are the parts of DNA which do not encode protein sequences. They may or may not be transcribed into RNA.
E.g.: tRNA, rRNA, sRNA genes
![Page 9: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/9.jpg)
NECESSITYWhy we need gene prediction algorithms?
![Page 10: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/10.jpg)
Necessity
• There have been a sharp up trend in the number of genomes sequenced in the past decade.
![Page 11: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/11.jpg)
Necessity
7/98
1/99
7/99
1/00
7/00
1/01
7/01
1/02
7/02
1/03
7/03
1/04
7/04
1/05
7/05
1/06
7/06
1/07
7/07
1/08
7/08
1/09
7/09
1/10
7/10
1/11
7/11
1/12
0
200
400
600
800
1000
1200
1400
1600
1800
2000
KEGG Genome: Release Update of Jan 2012
No. of Genomes in KEGG
![Page 12: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/12.jpg)
Necessity
• There have been a sharp up trend in the number of genomes sequenced in the past decade.
• Accurately predicting genes can significantly reduce the amount of experimental verification work which is time and labor consuming and expensive to carry out
• Current state-of-art gene predictors have a high accuracy of ~90-99% (i.e., able to predict >90% of the experimentally validated genes)
![Page 13: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/13.jpg)
METHODSHow the gene predictors make the predictions?
![Page 14: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/14.jpg)
Gene Prediction Methods
• Gene Prediction represents one of the most difficult problems in the field of pattern recognition, particularly in the case of eukaryotes
• The principle difficulties are:o Detection of initiation site (AUG)o Alternative start codonso Gene overlapo Undetected small proteins
![Page 15: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/15.jpg)
Gene Prediction Methods
ACGTACTACGTACGTACGTACGATCGATCGATCGATCGATCGACTGATCGATCGATCGATCGTACGTAGCGACTGACTGACTGATCGACTACGTAGCTGCAGTCAGTCGACTGACTGACTA
Ab-initio methods Homology based methodsAb-initio methods
![Page 16: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/16.jpg)
Ab-initio Methods
• Predicts gene based on the given sequence alone.
• Consists of two types of models:o Markov based modelso Dynamic Programming
![Page 17: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/17.jpg)
A brief introduction of HMMs
• Hidden Markov models (HMMs) are discrete Markov processes where every state generates an observation at each time step.
• A hidden Markov model (HMM) is statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.
![Page 18: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/18.jpg)
Markov Model (Discrete Markov Process)
• A discrete Markov process is a sequence of random variables q1,…,qt that take values in a discrete set S={s1,…,sN} where the Markov property holds.
• Markov property:
• Parameters– Initial state probabilities: πi– State transition probabilities: aij
18
![Page 19: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/19.jpg)
From Markov Model to HMM
• HMMs are discrete Markov processes where each state also emits an observation according to some probability distribution, we need to augment our model.
• Parameters– Initial state probabilities: πi
– State transition probabilities: aij
– Emission probabilities: ei(k)
19
Markov Model Hidden Markov Model
Each state emits an observation with 100% probability
Each state emits an observation according to a certain probability distribution
![Page 20: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/20.jpg)
¡EMPECEMOS!
![Page 21: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/21.jpg)
Say Adios to your windows and get to Linux!!
Say “Si” when you are ready to work on Linux!!!Di que “Si” si tu estas listo para trabajar con Linux!!!
![Page 22: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/22.jpg)
A Quick Linux How-To Manual
• Terminal (and Kernel)!“That’s Linux to me!” – Lava
• Basic Navigations in Terminal:– Change to a specific directory – cd– List the contents of the folder – ls– Come up one level of the folder – cd ..– Copy a file to one location to another – cp– Move a file from one location to another – mv– Rename a file (file1) to (file2) – mv file1 file2
![Page 23: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/23.jpg)
A Quick Linux How-To Manual
– Autocomplete – tab!– Extract a file – tar –xvf [file name]– Installing a software:
• Navigate to the folder where “Makefile” is present• Type make• Wait for the installer to finish processing• Programs will be stored in the same folder or a different folder by
the name “bin” (stands for basic input)
![Page 24: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/24.jpg)
That’s all Folks, Thank you for coming, Gracious!
![Page 25: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/25.jpg)
Naah, Just kidding! Lets get down to business!
![Page 26: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/26.jpg)
GeneMark
• Developed by Dr. Mark Borodovsky (from Georgia Tech!)
• Works on elegant pseudo-HMMs and HMM• Several versions available – prokayotic/eukaryotic,
self training
![Page 27: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/27.jpg)
Running GeneMark
• ./gmsn.pl --prok --out [output file] [genome file]
![Page 28: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/28.jpg)
Glimmer3
• Works by creating a variable-length Markov model from a training set of genes
• Using the model to identify all genes in a DNA sequence
![Page 29: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/29.jpg)
Running Glimmer3
• It’s a 2 step progress1. A probability model of coding sequences must be
built called an interpolated context model../build-icm [model name] < [genome]
2. Program is run to analyze the sequences and make gene predictions./glimmer3 [genome] [icm_model] [output]o Best results require longest possible training set of genes
![Page 30: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/30.jpg)
Glimmer3 programs (if you are curious)
• Long-orfs uses an amino-acid distribution model to filter the set of orfs
• Extract builds training set from long, nonoverlapping orfs
• Build-icm build interpolated context model from training sequences
• Glimmer3 analyze sequences and make predictions
![Page 31: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/31.jpg)
RNA Prediction
![Page 32: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/32.jpg)
Running tRNA-Scan-SE
tRNAscan-SE –B -o <outputfile1> -f <outputfile2> -m <outputfile3> <inputfile>
-B <file> : search for bacterial tRNAsThis option selects the bacterial covariace model for tRNA analysis, and loosens the search parameters for EufindtRNA to improve detection o f bacterial tRNAs.
-o <file> : save final results in <file> Specifiy this option to write results to <file>.
-f <file> : save results and tRNA secondary structures to <file>.
-m <file> : save statistics summary for runcontains the run options selected as well as statistics on the number of tRNAs detected at each phase of the search, search speed, and other statistics.
![Page 33: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/33.jpg)
Output using “–o” parameter
Output using “–f” parameter
![Page 34: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/34.jpg)
![Page 35: Gene Prediction in Genomic Studies Ab-initio based methods](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816376550346895dd453b7/html5/thumbnails/35.jpg)
THANK YOUYes I am serious. We are done. You are saved!