Compartments in the eukaryotic cell

Post on 21-Jan-2016

35 views 0 download

Tags:

description

Compartments in the eukaryotic cell. Protein targeting/localization signals. Signal peptide Mitochondrial targeting peptide Chloroplast targeting peptide LPxTG sorting signal Peroxisomal targeting signal (PTS2) Signal anchor Nuclear localization signal ER/Golgi retention signal - PowerPoint PPT Presentation

Transcript of Compartments in the eukaryotic cell

Compartments in the eukaryotic cell

Protein targeting/localization signals

• Signal peptide• Mitochondrial targeting peptide• Chloroplast targeting peptide• LPxTG sorting signal • Peroxisomal targeting signal (PTS2)• Signal anchor• Nuclear localization signal• ER/Golgi retention signal • Peroxisomal targeting signal (PTS1)• Transmembrane helices

Cleaved

Uncleaved

Classical secretory pathway

The secretory signal peptide

Targeting to the ER

Eukaryotic signal peptide logo

Characteristics of signal peptides

Length n-region h-region c-region -3, -1

Euk 22 only slightly Arg-rich

short, very hydrophobic

short, no pattern

small and neutral

residues

Gram- 25 Lys+Arg-rich slightly longer, less

hydrophobic

short, Ser+Ala-

rich

almost exclusively

Ala

Gram+ 32 Lys+Arg-rich very long, less hydrophobic

longer, Thr+Pro-

rich

almost exclusively

Ala

Prokaryotic signal peptide logos

Gram-positive bacteria

Gram-negative bacteria

Positive and negative training data: secreted versus cytoplasmic and nuclear sequences 130 YGIW_ECOLIMAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPNGSVTTVESAKSLRDDTWVTLRGNIVERISDDLYVFKD 80ASGTINVDIDHKRWNGVTVTPKDTVEIQGEVDKDWNSVEIDVKQIRKVNP 160SSSSSSSSSSSSSSSSSSSSCMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------- 160 184 PMFA_PROMIMKLSKIALAAALVFGINSVATAENETPAPKVSSTKGEIQLKGEIVNSACGLAASSSPVIVDFSEIPTSALANLQKAGNIK 80KDIELQDCDTTVAKTATVSYTPSVVNAVNKDLASFVSGNASGAGIGLMDAGSKAVKWNTATTPVQLINGVSKIPFVAYVQ 160AESADAKVTPGEFQAVINFQVDYQ 240SSSSSSSSSSSSSSSSSSSSSSCMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------------------------------------- 160------------------------ 324 CYSB_KLEAEMKLQQLRYIVEVVNHNLNVSSTAEGLYTSQPGISKQVRMLEDELGIQIFARSGKHLTQVTPAGQEIIRIAREVLSKVDAI 80KSVAGEHTWPDKGSLYVATTHTQARYALPGVIKGFIERYPRVSLHMHQGSPTQIAEAVSKGNADFAIATEALHLYDDLVM 160LPCYHWNRSIVVTPEHPLATKASVSIEELAQYPLVTYTFGFTGRSELDTAFNRAGLTPRIVFTATDADVIKTYVRLGLGV 240GVIASMAVDPVSDPDLVKLDANGIFSHSTTKIGFRRSTFLRSYMYDFIQRFAPHLTRDVVDTAVALRSNEDIEAMFKDIK 320LPEK 400MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM------------------------------------------------------------- 160-------------------------------------------------------------------------------- 240-------------------------------------------------------------------------------- 320---- 400 157 SBMC_ECOLIMNYEIKQEEKRTVAGFHLVGPWEQTVKKGFEQLMMWVDSKNIVPKEWVAVYYDNPDETPAEKLRCDTVVTVPGYFTLPEN 80SEGVILTEITGGQYAVAVARVVGDDFAKPWYQFFNSLLQDSAYEMLPKPCFEVYLNNGAEDGYWDIEMYVAVQPKHH 160MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 80MMMMMMMMMMMMMMMMMMM---------------------------------------------------------- 160

Data partitioning for training and test

Remove highly similar sequences from data set, where cleavage siteInformation reliably can be transferred by alignment.

A redundancy reduced data set can be used to make, say five-fold cross-validation.

The training set may ideally contain equal amounts of sequences with negative and positive examples.

Training

Test

Sliding window

Sequence: MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPNGSVTTVES ...

Window size here is 9 (example)

Window 1: MAKFAAVIAWindow 2: AKFAAVIAVWindow 3: KFAAVIAVMWindow 4: FAAVIAVMA...Window 10: VMALCSAPV...

For signal peptide prediction typically the first 70 aa of positive and negative sequenes are used.

Graphical output from SignalP

Alternative start codon “prediction”

Symmetric and asymmetric neural network window sizes

SignalP uses two different networks for signal peptide prediction:

• Cleavage site prediction network (C-score)• Signal peptide vs. non-signal peptide discrimination network (S-score)

An asymmetric window is used for cleavage site prediction (more information are found upstream of the cleavage site (see logo))

A symmetric window is used for discrimination between signal peptide windows and mature protein windows

Neural network windows in SignalP

MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPN

MAKFAAVIAVMALCSAPVMAAEQGGFSGPSATQSQAGGFQGPN

Asymmetric window

Symmetric window

Cleavage

Performance calculation

fntp

tp

ySensitivit

fn) fp)(tn fp)(tn fn)(tp (tp

fp · fntp · tn -

cc

tp: true positivetn: true negativefp: false positivefn: false negative

fptp

tp

ySpecificit

Optimization of window sizes

Optimization of window sizes for SignalP version 3.0

NN window sizes for SignalP 3.0

Cleavage site network

Discrimination network

Window Hidden Window Hidden

Euk 19+4 2 27 4

Gram- 11+3 2 19 3

Gram+ 21+2 0 19 3

Window sizes used in the final method

An asymmetric window is best for the cleavage site prediction,whereas symmetric windows is best for discrimination.

SignalP 3.0 architecture

...

...

I1 I2 I3

H1 H2 H3

O1

Input layer

Weights

Hidden layer

Output layer

Weights

O2

Input sequence data

I

H

I

H

Sequence composition

Window position

In addition to sequence input, composition (entire sequence) and position of the sliding window was used in the neural network of SignalP 3.0

Composition of secretory vs. non-secretory proteins

Composition weights

What is new in SignalP version 3.0!

• Data set– From SWISS-PROT rel. 40.0– Highly curated– Cleaned for spurious residues at pos. -1

• Length and composition– improves the performance significantly– Length improves both discrimination and cleavage performance– Composition improves discrimination

• D-score– Average of mean-S score and Y-max score – Better discrimination

Database annotation errors

• Some of the manually curated databases contain obvious errors that can be eliminated

• General ``SIGNAL´´ errors– Signal peptide include propeptide– Wrong signal peptide cleavage site– The secreted protein is processed by proteases– Wrong start codon used– Signal peptide of different class, ie. TAT or bacteriocin

(prokaryote)

Signal peptide or propeptide

N –

S igna l peptide

P ropeptide

M ature pro te in

Signal peptide or propeptide

Propeptide cleavage

Signal peptide cleavage

Isoelectric point calculations

Improvement by length and composition

Performance of three different SignalP versions

VersionCleavage site (Y-score) Discrimination (SP/non-SP)

Euk Gram- Gram+ Euk Gram- Gram+

SignalP1 NN 70.2 79.3 67.9 0.97 0.88 0.96

SignalP2 NN 72.4 83.4 67.4 0.97 0.90 0.96

SignalP2 HMM 69.5 81.4 64.5 0.94 0.93 0.96

SignalP3 NN 79.0 92.5 85.0 0.98 0.95 0.98

SignalP3 HMM 75.7 90.2 81.6 0.94 0.94 0.98

SignalP1 paper now has more than 3300 citations, SignalP3 more than 1,200.

Exons and introns: discontinous protein coding regions in eukaryotes

Two ways to solve the problem

Predict splice sites (GT-donor and AG-acceptor)

or

Predict coding versus non-coding

(at least in non-UTRs)

C C T G G A C C G G G T G A

0.12 0.11 0.10

C T G G A C C G G G T G A C

0.12 0.11 0.10 0.14

T G G A C C G G G T G A C G

0.12 0.11 0.10 0.14 0.23

Splice site networks overpredict a lot

Combination of splice site and coding/non-coding networks

Combinationof splice siteand coding/non-codingnetworks

1 HUMA1ATP TACATCTTCTTTAAAGGTAAGGTTGCTCAACCA 1 HUMA1ATP CCTGAAGCTCTCCAAGGTGAGATCACCCTGACG 1 HUMACCYBA CCACACCCGCCGCCAGGTAAGCCCGGCCAGCCG 1 HUMACCYBA CGAGAAGATGACCCAGGTGAGTGGCCCGCTACC 1 HUMACTGA GCGCCCCAGACACCAGGTGAGTGGATGGCGCCG 1 HUMACTGA AGAGAAGATGACTCAGGTGAGGCTCGGCCGACG 1 HUMACTGA CACCATGAAGATCAAGGTGAGTCGAGGGGTTGG 1 HUMADAG TCTTATACTATGGCAGGTAAGTCCATACAGAAG 1 HUMALPHA CGTGGCTCTGTCCAAGGTAAGTGCTGGGCTACC 1 HUMALPI CCTGGCTCTGTCCAAGGTAAGGGCTGGGCCACC 1 HUMALPPD TGTGGCTCTGTCCAAGGTAAGTGCTGGGCTACC 1 HUMAPRTA CCTGGAGTACGGGAAGGTAAGAGGGCTGGGGTG 1 HUMCAPG GAAGGCTGCCTTCAAGGTAAGGCATGGGCATTG 1 HUMCFVII GGAGTGTCCATGGCAGGTAAGGCTTCCCCTGGC 1 HUMCP21OH CACCTTGGGCTGCAAGGTGAGAGGCTGATCTCG 1 HUMCP21OHC CACCTTGGGCTGCAAGGTGAGAGGCTGATCTCG 1 HUMCS1 GTGGCAATGGCTCCAGGTAAGCGCCCCTAAAAT 1 HUMCSFGMA AATGTTTGACCTCCAGGTAAGATGCTTCTCTCT 1 HUMCSPB AAAGACTTCCTTTAAGGTAAGACTATGCACCTG 1 HUMCSFGMA AATGTTTGACCTCCAGGTAAGATGCTTCTCTCT 1 HUMCSPB AAAGACTTCCTTTAAGGTAAGACTATGCACCTG 1 HUMCYC1A GCTACGGACACCTCAGGTGAGCGCTGGGCCGGG ... 2 HUMA1ATP CCTGGGACAGTGAATCGTAAGTATGCCTTTCAC 2 HUMA1ATP AAAATGAAGACAGAAGGTGATTCCCCAACCTGA 2 HUMA1GLY2 CGCCACCCTGGACCGGGTGAGTGCCTGGGCTAG 2 HUMA1GLY2 GAGAGTACCAGACCCGGTGAGAGCCCCCATTCC 2 HUMA1GLY2 ACCGTCTCCAGATACGGTGAGGGCCAGCCCTCA 2 HUMA1GLY2 GGGCTGTCTTTCTATGGTAGGCATGCTTAGCAG 2 HUMA1GLY2 CACCGACTGGAAAAAGGTAAACGCAAGGGATTG 2 HUMACCYBA GCGCCCCAGGCACCAGGTAGGGGAGCTGGCTGG 2 HUMACCYBA CAGCCTTCCTTCCTGGGTGAGTGGAGACTGTCT 2 HUMACCYBA CACAATGAAGATCAAGGTGGGTGTCTTTCCTGC 2 HUMACTGA TCGCGTTTCTCTGCCGGTGAGCGCCCCGCCCCG 2 HUMADAG CTTCGACAAGCCCAAAGTGAGCGCGCGCGGGGG 2 HUMADAG TGTCCAGGCCTACCAGGTGGGTCCTGTGAGAAG 2 HUMADAG CGAAGTAGTAAAAGAGGTGAGGGCCTGGGCTGG ... 11 HUMCS1 AACGCAACAGAAATCCGTGAGTGGATGCCGTCT 11 HUMGHN AACACAACAGAAATCCGTGAGTGGATGCCTTCT 52 HUMHSP90B CTCTAATGCTTCTGATGTAGGTGCTCTGGTTTC 80 HUMMETIF1 ACCTCCTGCAAGAAGAGTGAGTGTGAGGCCATC 112 HUMHSP90B ATACCAGAGTATCTCAGTGAGTATCTCCTTGGC 113 HUMHST GCGGACACCCGCGACAGTGAGTGGCGCGGCCAG 113 HUMLACTA GACATCTCCTGTGACAGTGAGTAGCCCCTATAA 151 HUMKAL2 ATCGAACCAGAGGAGTGTACGCCTGGGCCAGAT 157 HUMCS1 CACCTACCAGGAGTTTGTAAGTTCTTGGGGAAT 157 HUMGHN CACCTACCAGGAGTTTGTAAGCTCTTGGGGAAT 164 HUMALPHA CAACATGGACATTGATGTGCGACCCCCGGGCCA 622 HUMCFVII CTGATCGCGGTGCTGGGTGGGTACCACTCTCCC 636 HUMADAG CCTGGAACCAGGCTGAGTGAGTGATGGGCCTGG 895 HUMAPOCIB TCCAGCAAGGATTCAGGTTGTTGAGTGCTTGGG 970 HUMALPHA CGGGCCAAGAAAGCAGGTGGAGCTGGGGCCCGG2114 HUMAPRTA ATCGACTACATCGCAGGCGAGTGCCAGTGGCCG

Neural network weight analysis: reading frame detection

Exon-intron transistion detection units