Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

28
Gene, Proteins, and Genetic Code

Transcript of Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Page 1: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Gene, Proteins, and Genetic Code

Page 2: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Protein Synthesis in a Cell

Page 3: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

A protein sequence

>gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …

MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN

IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD

EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI

SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE

SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL

VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG

FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP

TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV

VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST

A protein sequence may have a few hundreds to several thousands amino acids.

Page 4: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Protein synthesis

Page 5: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Genetic code ..ATTCACAGTGGA..

I

H

S

G

Page 6: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Notes on translation

• Three Reading frames

• Third base not important

• 5’ -> 3’

• Start and end codon• Open Reading Frame (ORF)

• Each gene is an ORF, but not all ORF are genes.

Page 7: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

The Central Dogma of Molecular Biology

DNA RNA Proteintranscript translation

replication

genotype phenotype

Page 8: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Exception – retroviruses

DNA RNA Proteintranscript translation

replication

genotype phenotype

Page 9: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

ProteinPhenotype

DNA(Genotype)

Biology

Page 10: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Genes• One gene encodes one protein (or sometimes

RNA).• Like a program, it starts with start codon (e.g.

ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene.

• Genes are dense in prokaryotes and sparse in eukaryotes.

• In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.

Page 11: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Gene related diseases

• Hemophilia: on X chromosome.• Sickle-Cell Anemia: single nucleotide mutation in the first

exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes)

• BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer)

• Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease.

• P53 gene: chr. 17p, tumor suppressor protein.

Page 12: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Gene Prediction and AnnotationProkaryotes

1. Start/stop codon (ORF)2. Promoters3. Content4. Sequence similarity

Page 13: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Page 14: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Start Codon

May miss short genes.Do not know which start codon to use.Overlapping ORF at different reading frames.

Page 15: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Promoters

<-- upstream downstream -->

5'-XXXXPPPPPPXXXXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3‘

-35 -10 Gene to be transcribed

-10: T A T A A T 77% 76% 60% 61% 56% 82%-35: T T G A C A 69% 79% 61% 56% 54% 54%

Pribnow box

In prokaryotes, the promoter consists of two short sequences at -10 and -35 position upstream of the gene, that is, prior to the gene in the direction of transcription. The sequence at -10 is called the Pribnow box and usually consists of the six nucleotides TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes. The other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence allows a very high transcription rate.

These rules are only approximately correct.

Page 16: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Scoring a 6-mer as Pribnow box

•We need a “score function” to measure the likelihood that a 6-mer is a pribnow box

Page 17: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

An exemplary function for pribnow box fitness evaluation

log()

Page 18: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Content I – codon bias• A codon XYZ occurs with different freqencies in

coding regions and non-coding regions• different amino acids have different freq.• Diff. codons for the same amino acid have diff. freq.• In non-coding regions approx. p(X)*p(Y)*p(Z)

Page 19: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

http://www.kazusa.or.jp/codon/

Page 20: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Codon bias• First use many known genes of the organism or

similar organisms to train codon frequency table.• Each codon ci has f(ci).

• Second compute the background frequency of each base bf(X) for X=A,C,G,T

• The “significance” of a codon c=XYZ is then –log( f(c) / (bf(X)*bf(Y)*bf(Z))).

• High average significance in a region is an indication of gene.

Page 21: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Page 22: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Content II - Hidden Markov Model (HMM)

Page 23: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Eukaryotes

• Basic idea similar to Prokaryotes

• Difference:

Page 24: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

DNA-specific transcription factors

• These are the basic of gene-regulatory network• Another hot area in Bioinformatics

Page 25: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.

Splicing

• Consensus sequences have been identified as necessary but not sufficient for splicing. In vertebrates, these sequences are (the slash identifies the exon-intron or intron-exon junction): • C(orA)AG/GTA(orG)AGT "donor" splice site • T(orC)nNC(orT)AG/G "acceptor" splice site. • A third sequence, which in yeast is TACTAAC , is necessary

within the intron sequence.

These rules are only approximately correct.

Page 26: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Page 27: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Page 28: Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.