Compartments in the eukaryotic cell

43

description

Compartments in the eukaryotic cell. Protein targeting/localization signals. Signal peptide Mitochondrial targeting peptide Chloroplast targeting peptide LPxTG sorting signal Peroxisomal targeting signal (PTS2) Signal anchor Nuclear localization signal ER/Golgi retention signal - PowerPoint PPT Presentation

Transcript of Compartments in the eukaryotic cell

Page 1: Compartments in the eukaryotic cell
Page 2: Compartments in the eukaryotic cell
Page 3: Compartments in the eukaryotic cell
Page 4: Compartments in the eukaryotic cell

Compartments in the eukaryotic cell

Page 5: Compartments in the eukaryotic cell

Protein targeting/localization signals

• Signal peptide• Mitochondrial targeting peptide• Chloroplast targeting peptide• LPxTG sorting signal • Peroxisomal targeting signal (PTS2)• Signal anchor• Nuclear localization signal• ER/Golgi retention signal • Peroxisomal targeting signal (PTS1)• Transmembrane helices

Cleaved

Uncleaved

Page 6: Compartments in the eukaryotic cell

Classical secretory pathway

Page 7: Compartments in the eukaryotic cell

The secretory signal peptide

Page 8: Compartments in the eukaryotic cell

Targeting to the ER

Page 9: Compartments in the eukaryotic cell

Eukaryotic signal peptide logo

Page 10: Compartments in the eukaryotic cell

Characteristics of signal peptides

Length n-region h-region c-region -3, -1

Euk 22 only slightly Arg-rich

short, very hydrophobic

short, no pattern

small and neutral

residues

Gram- 25 Lys+Arg-rich slightly longer, less

hydrophobic

short, Ser+Ala-

rich

almost exclusively

Ala

Gram+ 32 Lys+Arg-rich very long, less hydrophobic

longer, Thr+Pro-

rich

almost exclusively

Ala

Page 11: Compartments in the eukaryotic cell

Prokaryotic signal peptide logos

Gram-positive bacteria

Gram-negative bacteria

Page 12: Compartments in the eukaryotic cell

Positive and negative training data: secreted versus cytoplasmic and nuclear sequences 130 YGIW_ECOLIMAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPNGSVTTVESAKSLRDDTWVTLRGNIVERISDDLYVFKD 80ASGTINVDIDHKRWNGVTVTPKDTVEIQGEVDKDWNSVEIDVKQIRKVNP 160SSSSSSSSSSSSSSSSSSSSCMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------- 160 184 PMFA_PROMIMKLSKIALAAALVFGINSVATAENETPAPKVSSTKGEIQLKGEIVNSACGLAASSSPVIVDFSEIPTSALANLQKAGNIK 80KDIELQDCDTTVAKTATVSYTPSVVNAVNKDLASFVSGNASGAGIGLMDAGSKAVKWNTATTPVQLINGVSKIPFVAYVQ 160AESADAKVTPGEFQAVINFQVDYQ 240SSSSSSSSSSSSSSSSSSSSSSCMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------------------------------------- 160------------------------ 324 CYSB_KLEAEMKLQQLRYIVEVVNHNLNVSSTAEGLYTSQPGISKQVRMLEDELGIQIFARSGKHLTQVTPAGQEIIRIAREVLSKVDAI 80KSVAGEHTWPDKGSLYVATTHTQARYALPGVIKGFIERYPRVSLHMHQGSPTQIAEAVSKGNADFAIATEALHLYDDLVM 160LPCYHWNRSIVVTPEHPLATKASVSIEELAQYPLVTYTFGFTGRSELDTAFNRAGLTPRIVFTATDADVIKTYVRLGLGV 240GVIASMAVDPVSDPDLVKLDANGIFSHSTTKIGFRRSTFLRSYMYDFIQRFAPHLTRDVVDTAVALRSNEDIEAMFKDIK 320LPEK 400MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------------------------------------- 160-------------------------------------------------------------------------------- 240-------------------------------------------------------------------------------- 320---- 400 157 SBMC_ECOLIMNYEIKQEEKRTVAGFHLVGPWEQTVKKGFEQLMMWVDSKNIVPKEWVAVYYDNPDETPAEKLRCDTVVTVPGYFTLPEN 80SEGVILTEITGGQYAVAVARVVGDDFAKPWYQFFNSLLQDSAYEMLPKPCFEVYLNNGAEDGYWDIEMYVAVQPKHH 160MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM---------------------------------------------------------- 160

Page 13: Compartments in the eukaryotic cell

Data partitioning for training and test

Remove highly similar sequences from data set, where cleavage siteInformation reliably can be transferred by alignment.

A redundancy reduced data set can be used to make, say five-fold cross-validation.

The training set may ideally contain equal amounts of sequences with negative and positive examples.

Training

Test

Page 14: Compartments in the eukaryotic cell

Sliding window

Sequence: MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPNGSVTTVES ...

Window size here is 9 (example)

Window 1: MAKFAAVIAWindow 2: AKFAAVIAVWindow 3: KFAAVIAVMWindow 4: FAAVIAVMA...Window 10: VMALCSAPV...

For signal peptide prediction typically the first 70 aa of positive and negative sequenes are used.

Page 15: Compartments in the eukaryotic cell

Graphical output from SignalP

Page 16: Compartments in the eukaryotic cell

Alternative start codon “prediction”

Page 17: Compartments in the eukaryotic cell

Symmetric and asymmetric neural network window sizes

SignalP uses two different networks for signal peptide prediction:

• Cleavage site prediction network (C-score)• Signal peptide vs. non-signal peptide discrimination network (S-score)

An asymmetric window is used for cleavage site prediction (more information are found upstream of the cleavage site (see logo))

A symmetric window is used for discrimination between signal peptide windows and mature protein windows

Page 18: Compartments in the eukaryotic cell

Neural network windows in SignalP

MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPN

MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPN

Asymmetric window

Symmetric window

Cleavage

Page 19: Compartments in the eukaryotic cell

Performance calculation

fntp

tp

ySensitivit

fn) fp)(tn fp)(tn fn)(tp (tp

fp · fntp · tn -

cc

tp: true positivetn: true negativefp: false positivefn: false negative

fptp

tp

ySpecificit

Page 20: Compartments in the eukaryotic cell

Optimization of window sizes

Optimization of window sizes for SignalP version 3.0

Page 21: Compartments in the eukaryotic cell

NN window sizes for SignalP 3.0

Cleavage site network

Discrimination network

Window Hidden Window Hidden

Euk 19+4 2 27 4

Gram- 11+3 2 19 3

Gram+ 21+2 0 19 3

Window sizes used in the final method

An asymmetric window is best for the cleavage site prediction,whereas symmetric windows is best for discrimination.

Page 22: Compartments in the eukaryotic cell

SignalP 3.0 architecture

...

...

I1 I2 I3

H1 H2 H3

O1

Input layer

Weights

Hidden layer

Output layer

Weights

O2

Input sequence data

I

H

I

H

Sequence composition

Window position

In addition to sequence input, composition (entire sequence) and position of the sliding window was used in the neural network of SignalP 3.0

Page 23: Compartments in the eukaryotic cell

Composition of secretory vs. non-secretory proteins

Page 24: Compartments in the eukaryotic cell

Composition weights

Page 25: Compartments in the eukaryotic cell

What is new in SignalP version 3.0!

• Data set– From SWISS-PROT rel. 40.0– Highly curated– Cleaned for spurious residues at pos. -1

• Length and composition– improves the performance significantly– Length improves both discrimination and cleavage performance– Composition improves discrimination

• D-score– Average of mean-S score and Y-max score – Better discrimination

Page 26: Compartments in the eukaryotic cell

Database annotation errors

• Some of the manually curated databases contain obvious errors that can be eliminated

• General ``SIGNAL´´ errors– Signal peptide include propeptide– Wrong signal peptide cleavage site– The secreted protein is processed by proteases– Wrong start codon used– Signal peptide of different class, ie. TAT or bacteriocin

(prokaryote)

Page 27: Compartments in the eukaryotic cell

Signal peptide or propeptide

N –

S igna l peptide

P ropeptide

M ature pro te in

Page 28: Compartments in the eukaryotic cell

Signal peptide or propeptide

Propeptide cleavage

Signal peptide cleavage

Page 29: Compartments in the eukaryotic cell

Isoelectric point calculations

Page 30: Compartments in the eukaryotic cell

Improvement by length and composition

Page 31: Compartments in the eukaryotic cell

Performance of three different SignalP versions

VersionCleavage site (Y-score) Discrimination (SP/non-SP)

Euk Gram- Gram+ Euk Gram- Gram+

SignalP1 NN 70.2 79.3 67.9 0.97 0.88 0.96

SignalP2 NN 72.4 83.4 67.4 0.97 0.90 0.96

SignalP2 HMM 69.5 81.4 64.5 0.94 0.93 0.96

SignalP3 NN 79.0 92.5 85.0 0.98 0.95 0.98

SignalP3 HMM 75.7 90.2 81.6 0.94 0.94 0.98

SignalP1 paper now has more than 3300 citations, SignalP3 more than 1,200.

Page 32: Compartments in the eukaryotic cell

Exons and introns: discontinous protein coding regions in eukaryotes

Page 33: Compartments in the eukaryotic cell
Page 34: Compartments in the eukaryotic cell

Two ways to solve the problem

Predict splice sites (GT-donor and AG-acceptor)

or

Predict coding versus non-coding

(at least in non-UTRs)

Page 35: Compartments in the eukaryotic cell

C C T G G A C C G G G T G A

0.12 0.11 0.10

Page 36: Compartments in the eukaryotic cell

C T G G A C C G G G T G A C

0.12 0.11 0.10 0.14

Page 37: Compartments in the eukaryotic cell

T G G A C C G G G T G A C G

0.12 0.11 0.10 0.14 0.23

Page 38: Compartments in the eukaryotic cell

Splice site networks overpredict a lot

Page 39: Compartments in the eukaryotic cell

Combination of splice site and coding/non-coding networks

Page 40: Compartments in the eukaryotic cell

Combinationof splice siteand coding/non-codingnetworks

Page 41: Compartments in the eukaryotic cell

1 HUMA1ATP TACATCTTCTTTAAAGGTAAGGTTGCTCAACCA 1 HUMA1ATP CCTGAAGCTCTCCAAGGTGAGATCACCCTGACG 1 HUMACCYBA CCACACCCGCCGCCAGGTAAGCCCGGCCAGCCG 1 HUMACCYBA CGAGAAGATGACCCAGGTGAGTGGCCCGCTACC 1 HUMACTGA GCGCCCCAGACACCAGGTGAGTGGATGGCGCCG 1 HUMACTGA AGAGAAGATGACTCAGGTGAGGCTCGGCCGACG 1 HUMACTGA CACCATGAAGATCAAGGTGAGTCGAGGGGTTGG 1 HUMADAG TCTTATACTATGGCAGGTAAGTCCATACAGAAG 1 HUMALPHA CGTGGCTCTGTCCAAGGTAAGTGCTGGGCTACC 1 HUMALPI CCTGGCTCTGTCCAAGGTAAGGGCTGGGCCACC 1 HUMALPPD TGTGGCTCTGTCCAAGGTAAGTGCTGGGCTACC 1 HUMAPRTA CCTGGAGTACGGGAAGGTAAGAGGGCTGGGGTG 1 HUMCAPG GAAGGCTGCCTTCAAGGTAAGGCATGGGCATTG 1 HUMCFVII GGAGTGTCCATGGCAGGTAAGGCTTCCCCTGGC 1 HUMCP21OH CACCTTGGGCTGCAAGGTGAGAGGCTGATCTCG 1 HUMCP21OHC CACCTTGGGCTGCAAGGTGAGAGGCTGATCTCG 1 HUMCS1 GTGGCAATGGCTCCAGGTAAGCGCCCCTAAAAT 1 HUMCSFGMA AATGTTTGACCTCCAGGTAAGATGCTTCTCTCT 1 HUMCSPB AAAGACTTCCTTTAAGGTAAGACTATGCACCTG 1 HUMCSFGMA AATGTTTGACCTCCAGGTAAGATGCTTCTCTCT 1 HUMCSPB AAAGACTTCCTTTAAGGTAAGACTATGCACCTG 1 HUMCYC1A GCTACGGACACCTCAGGTGAGCGCTGGGCCGGG ... 2 HUMA1ATP CCTGGGACAGTGAATCGTAAGTATGCCTTTCAC 2 HUMA1ATP AAAATGAAGACAGAAGGTGATTCCCCAACCTGA 2 HUMA1GLY2 CGCCACCCTGGACCGGGTGAGTGCCTGGGCTAG 2 HUMA1GLY2 GAGAGTACCAGACCCGGTGAGAGCCCCCATTCC 2 HUMA1GLY2 ACCGTCTCCAGATACGGTGAGGGCCAGCCCTCA 2 HUMA1GLY2 GGGCTGTCTTTCTATGGTAGGCATGCTTAGCAG 2 HUMA1GLY2 CACCGACTGGAAAAAGGTAAACGCAAGGGATTG 2 HUMACCYBA GCGCCCCAGGCACCAGGTAGGGGAGCTGGCTGG 2 HUMACCYBA CAGCCTTCCTTCCTGGGTGAGTGGAGACTGTCT 2 HUMACCYBA CACAATGAAGATCAAGGTGGGTGTCTTTCCTGC 2 HUMACTGA TCGCGTTTCTCTGCCGGTGAGCGCCCCGCCCCG 2 HUMADAG CTTCGACAAGCCCAAAGTGAGCGCGCGCGGGGG 2 HUMADAG TGTCCAGGCCTACCAGGTGGGTCCTGTGAGAAG 2 HUMADAG CGAAGTAGTAAAAGAGGTGAGGGCCTGGGCTGG ... 11 HUMCS1 AACGCAACAGAAATCCGTGAGTGGATGCCGTCT 11 HUMGHN AACACAACAGAAATCCGTGAGTGGATGCCTTCT 52 HUMHSP90B CTCTAATGCTTCTGATGTAGGTGCTCTGGTTTC 80 HUMMETIF1 ACCTCCTGCAAGAAGAGTGAGTGTGAGGCCATC 112 HUMHSP90B ATACCAGAGTATCTCAGTGAGTATCTCCTTGGC 113 HUMHST GCGGACACCCGCGACAGTGAGTGGCGCGGCCAG 113 HUMLACTA GACATCTCCTGTGACAGTGAGTAGCCCCTATAA 151 HUMKAL2 ATCGAACCAGAGGAGTGTACGCCTGGGCCAGAT 157 HUMCS1 CACCTACCAGGAGTTTGTAAGTTCTTGGGGAAT 157 HUMGHN CACCTACCAGGAGTTTGTAAGCTCTTGGGGAAT 164 HUMALPHA CAACATGGACATTGATGTGCGACCCCCGGGCCA 622 HUMCFVII CTGATCGCGGTGCTGGGTGGGTACCACTCTCCC 636 HUMADAG CCTGGAACCAGGCTGAGTGAGTGATGGGCCTGG 895 HUMAPOCIB TCCAGCAAGGATTCAGGTTGTTGAGTGCTTGGG 970 HUMALPHA CGGGCCAAGAAAGCAGGTGGAGCTGGGGCCCGG2114 HUMAPRTA ATCGACTACATCGCAGGCGAGTGCCAGTGGCCG

Page 42: Compartments in the eukaryotic cell

Neural network weight analysis: reading frame detection

Page 43: Compartments in the eukaryotic cell

Exon-intron transistion detection units