Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining...

Post on 24-May-2020

9 views 0 download

Transcript of Spatial proteomics Combining experimental and annotation ...€¦ · Spatial proteomics Combining...

Spatial proteomicsCombining experimental and annotation data to

predict protein sub-cellular localisation.

Laurent Gattolg390@cam.ac.uk – @lgatt0

Computational Proteomics Unithttp://cpu.sysbiol.cam.ac.uk/

University of Cambridge

13 Jan 2015, Heidelberg

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Regulations

Cell organisation

Spatial proteomics is the systematic study of protein localisations.

Image from Wikipedia http://en.wikipedia.org/wiki/Cell_(biology).

Spatial proteomics - Why?

Mis-localisationDisruption of the targeting/trafficking process alters propersub-cellular localisation, which in turn perturb the cellular functionsof the proteins.I Abnormal protein localisation leading to the loss of functional

n effects in diseases (Laurila and Vihinen, 2009).I Disruption of the nuclear/cytoplasmic transport (nuclear

pores) have been detected in many types of carcinoma cells(Kau et al., 2004).

Multi- and re-localisation

I Differentiation: Tfe3 in mouse ESC (Betschinger et al., 2013).I Metabolism: changes in carbon sources, elemental limitations.

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010)

Fusion proteins and immunofluorescence

Spatial proteomics - How, experimentally

Single celldirect

observation

Population level

Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometryCataloguing Relative abundance

1 fraction2 fractions(enriched

and crude)n discrete fractions

n continuous fractions(gradient approaches)

Subtractiveproteomics

(enrichment)

Invariantrich

fraction(clustering)

(χ )2PCP LOPIT

(PCA, PLS-DA)

Pure fraction

catalogue

GFPEpitope

Prot.-spec.antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010). Gradientapproaches: Dunkley et al. (2006), Foster et al. (2006).

⇒ Explorative/discovery approches, global localisation maps.

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

Quantitation data and organelle markers

Fraction1 Fraction2 . . . Fractionm markersp1 q1,1 q1,2 . . . q1, m unknownp2 q2,1 q2,2 . . . q2, m loc1

p3 q3,1 q3,2 . . . q3, m unknownp4 q4,1 q4,2 . . . q4, m loci...

......

......

...

pj qj,1 qj,2 . . . qj, m unknown

Visualisation and classification

0.2

0.3

0.4

0.5

Correlation profile − ER

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 81112

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 81112

0.15

0.20

0.25

0.30

0.35

Correlation profile − PM

Fractions

1 2 4 5 7 81112

0.1

0.2

0.3

0.4

0.5

0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 81112

●●

●●

●●

●●

●●●● ●

●●

●●

●●

−10 −5 0 5

−5

05

Principal component analysis

PC1

PC

2

ERGolgimit/plastidPM

vacuolemarkerPLS−DAunknown

Figure : From Gatto et al. (2010), Arabidopsis thaliana data from Dunkleyet al. (2006)

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction and

visualisation: pRoloc (Gattoet al., 2014) and pRolocGUI

MSnSet (storageMode: lockedEnvironment)

assayData: 2031 features, 8 samples

element names: exprs

protocolData: none

phenoData

sampleNames: n113 n114 ... n121 (8 total)

varLabels: Fraction.information

varMetadata: labelDescription

featureData

featureNames: Q62261 Q9JHU4 ... Q9EQ93 (2031 total)

fvarLabels: Uniprot.ID UniprotName ... markers (8 total)

fvarMetadata: labelDescription

experimentData: use 'experimentData(object)'Annotation:

- - - Processing information - - -

Loaded on Fri Nov 7 16:49:05 2014.

Normalised to sum of intensities.

Added markers from 'mrk' marker vector. Fri Nov 7 16:49:05 2014

MSnbase version: 1.13.16

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction and

visualisation: pRoloc (Gattoet al., 2014) and pRolocGUI

I Several mouse E14TG2a Embryonic Stem cells.I Human Embryonic Kidney fibroblast cells.I The Arabidopsis AT CHLORO data base (Ferro et

al., 2010).I Mouse organs (Foster et al., 2006).I Arabidopsis from callus (Dunkley et al., 2006;

Nikolovksi et al. 2014) and roots (Groen et al.,2014).

I Drosophila embryos (Tan et al., 2009).I Chicken DT40 Lymphocyte cell (Hall et al., 2009).I . . .I Collected from the literature

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction and

visualisation: pRoloc (Gattoet al., 2014) and pRolocGUI

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

●●

●●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

ERGolgimitochondrionPMunknown

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●●

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60Sunknown

Previously . . .

I Infrastructure: MSnbase(Gatto and Lilley, 2012)

I Spatial proteomics data:pRolocdata (Gatto et al.,2014)

I Novely detection:pRoloc::phenoDisco

(Breckels et al., 2013)I Localisation prediction

and visualisation: pRoloc(Gatto et al., 2014) andpRolocGUI

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

PC1 (58.53%)P

C2

(29.

96%

)

CytoskeletonERGolgiLysosomemitochondrionNucleus

PeroxisomePMProteasomeRibosome 40SRibosome 60S

Plan

Introduction

Spatial proteomics

Previously . . .

Transfer learning

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, protein-protein interactions, ... . . .

I From a user perspective: ”free/cheap” vs. expensiveI Abundant (all proteins, 100s of features) vs. (experimentally)

limited/targeted (1000s of proteins, 6 – 20 of features)I For localisation in system at hand: low vs. high qualityI Static vs. dynamic

number GO features� experimental fractions⇒ dilution of experimental data

What about annotation data from repositories such as GO,sequence features, signal peptide, transmembrane domains,images, protein-protein interactions, ... . . .

I From a user perspective: ”free/cheap” vs. expensiveI Abundant (all proteins, 100s of features) vs. (experimentally)

limited/targeted (1000s of proteins, 6 – 20 of features)I For localisation in system at hand: low vs. high qualityI Static vs. dynamic

number GO features� experimental fractions⇒ dilution of experimental data

GoalSupport/complement the primary target domain (experimentaldata) with auxiliary data (annotation) features withoutcompromising the integrity of our primary data.

Updated experimental design for

I primary/experimental data

and

I auxiliary/annotation data

Fractionation/centrifugation

Quantitation/identificationby mass spectrometry

Database query

Extract GO CC terms

Convert terms to binary

PR

IMA

RY EX

PER

IMEN

TAL

DATA

AU

XIL

IARY D

RY D

ATA

O00767P51648Q2TAA5Q9UKV5......

GO:0016021 GO:0005789 GO:0005783 ... ... ...

1 1 1 ... ... ...1 1 0 ... ... ...1 1 0 ... ... ...0 0 0 ... ... .... . .. . .. . .. . .. . .. . .

x1

.

.

.

.

.

.

.

.xn

GO1 ... ... ... ... GOA

O00767P51648Q2TAA5Q9UKV5......

0.1361 0.150 0.1062 0.147 0.277 0.1429 0.0380 0.003380.1914 0.205 0.0566 0.165 0.237 0.0996 0.0180 0.027270.1297 0.201 0.0546 0.146 0.292 0.1463 0.0206 0.009020.0939 0.207 0.0419 0.204 0.344 0.1098 0.0000 0.00000. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .

x1

.

.

.

.

.

.

.

.xn

X113 X114 X115 X116 X117 X118 X119 X121

Visualisation Visualisation

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

−2 0 2 4

−2

−1

01

23

4

PC1 (40.28%)

PC

2 (2

5.7%

)

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

Data from mouse stem cells (E14TG2a)

We use a class-weighted kNNtransfer learning algorithm tocombine primary and auxiliarydata, based on Wu andDietterich (2004):

V(ci)j = θ∗nPij + (1 − θ∗)nA

ij

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

1

2

c1c2c3

NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

Weights matrix (labelled)

c1 c2 c3

θ1 0 0 0θ2 0 0 1

θi...

...... 1 1 0θΘl 1 1 1

F11

F12

F1i...

F1Θl

θ∗ = {1, 0, 1}

(r BiocParallel)

Classes and weightsC = {ci=1 , . . . , ci=l }; Θ = {0, 0.5, 1}

Primary data

LP =

q1,1 q1,2 . . . q1,mq2,1 q2,2 . . . q2,m

.

.

.

.

.

.qj,1 qj,2 . . . qj,m

;y1y2

.

.

.yj

; kP

Auxiliary data

LA =

b1,1 b1,2 . . . . . . b1,nb2,1 b2,2 . . . . . . b2,n

.

.

.

.

.

.bj,1 bj,2 . . . . . . bj,n

;y1y2

.

.

.yj

; kA

Neighbour matrices

NP =

ci=1 . . . ci=l

nP1,1 . . . nP

1,lnP

2,1 . . . nP2,l

.

.

.

.

.

.

; NA =

ci=1 . . . ci=l

nA1,1 . . . nA

1,lnA

2,1 . . . nA2,l

.

.

.

.

.

.

Class-weighted classifier(unlabelled)

V(ci)j = θ∗nPij + (1 − θ∗)nA

ij

ci=1 . . . ci=l

123 V(ci)j...

j

yj = argmax(V(ci)j)

θ∗ = {1, 0, 1} NP =

c1 c2 c3

p133 0 0

p213

23 0

......

...

V(c1)1 =1 ×

33

+ (1 − 1) × nA1,1

V(c2)1 =0 × 0 + (1 − 0) × nA1,2

V(c3)1 =1 × 0 + (1 − 1) × nA1,3

V(c1)2 =1 ×13

+ (1 − 1) × nA1,1

V(c2)2 =0 ×23

+ (1 − 0) × nA1,2

V(c3)2 =1 × 0 + (1 − 1) × nA1,3

Class-weighted classifier(unlabelled)

V(ci)j = θ∗nPij + (1 − θ∗)nA

ij

c1 c2 c3

1 V(c1)1 V(c2)1 V(c3)1

2 V(c1)2 V(c2)2 V(c3)2...

...

j

yj = argmax(V(ci)j)

D                                              E                        

A                    B                                    C  

● ●●

● ●● ●●●●●

● ●●●●●●●

●●

●●

●●

●●●

●●

40S Ribosome 60S Ribosome Cytosol Endoplasmic reticulum

Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus

Plasma membrane Proteasome

0.4

0.6

0.8

1.0

0.6

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.7

0.8

0.9

1.0

0.00

0.25

0.50

0.75

1.00

0.75

0.80

0.85

0.90

0.95

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary

Combined Primary Auxiliary Combined Primary Auxiliary

F1 s

core

−6 −4 −2 0

−6−4

−20

2

PC1 (3.43%)

PC2

(2.0

8%)

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●

● ●

●●

●●●

●●

●●●●●●●●●●●●●●

●●●

●●

●●●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●●●

●●●●●

●●

●●●●

●●●●●

●●●●

●●

●●●

●●

●●●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●● ●●●

●●

●●●

●●

●●

●●●

●●●

●● ●

●●

●●

● ●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown

−2 0 2 4

−2−1

01

23

4

PC1 (40.28%)

PC2

(25.

7%)

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

40S Ribosome60S RibosomeCytosolEndoplasmic reticulumLysosomeMitochondrionNucleus − ChromatinNucleus − NucleolusPlasma membraneProteasomeunknown ●

0.5

0.6

0.7

0.8

0.9

Combined Primary Auxiliary

F1 s

core

Proteasome

Plasma membrane

Nucleus − Nucleolus

Nucleus − Chromatin

Mitochondrion

Lysosome

Endoplasmic reticulum

Cytosol

60S Ribosome

40S Ribosome

0 1/3 2/3 1Classifier weight

Cla

ss

Data from mouse stem cells (E14TG2a).

Why? – Dual-localisation Proteins may be presentsimultaneously in several organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

Why? – Dual-localisation Proteins may be presentsimultaneously in several organelles (e.g. trafficking).

−6 −4 −2 0 2 4 6

−4

−2

02

4

PC1 (64.36%)

PC

2 (2

2.34

%) ●

●● ●

●●

● ●

●●●

●● ●

● ●●●●●

●● ●

●●

●●

●●●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●● ●●

●● ●

●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●●

●●

●● ●●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●●

● ●

●●

●●

●●

● ●● ●●

●●

●●

●● ●

●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

● ●

●● ●●

●●●

●●●

●●●●

●●

●●

●● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

● ●

●●

ER lumenER membraneGolgiMitochondrionPlastidPMRibosomeTGNvacuoleunknown

●● ● ●●

●● ● ● ●

●●

●●

●● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 0 2 4

−4

−2

02

4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%)

PC

2 (2

4.61

%)

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Actin cytoskeletonCytosolEndosomeER/GAExtracellular matrixLysosomeMitochondriaNucleus − ChromatinNucleus − NucleolusPeroxisomePlasma MembraneProteasomeRibosome 40SRibosome 60Sunknown

●Tfe3

FundingBBSRC and PRIME-XS EU FP7

I Lisa Breckels, Computational Proteomics UnitI Kathryn Lilley, Cambridge Centre of ProteomicsI Sean Holden, Computer Laboratory

Thank you for your attention

J Betschinger, J Nichols, S Dietmann, P D Corrin, P J Paddison, and A Smith. Exit from pluripotency is gated byintracellular redistribution of the bhlh transcription factor tfe3. Cell, 153(2):335–47, Apr 2013. doi:10.1016/j.cell.2013.03.012.

LM Breckels, L Gatto, A Christoforou, AJ Groen, KS Lilley, and MW Trotter. The effect of organelle discovery uponsub-cellular protein localisation. J Proteomics, 88:129–40, Aug 2013.

TPJ Dunkley, S Hester, IP Shadforth, J Runions, T Weimar, SL Hanton, JL Griffin, C Bessant, F Brandizzi, C Hawes,RB Watson, P Dupree, and KS Lilley. Mapping the Arabidopsis organelle proteome. PNAS, 103(17):6518–6523, Apr2006.

LJ Foster, CL de Hoog, Y Zhang, Y Zhang, X Xie, VK Mootha, and M Mann. A mammalian organelle map by proteincorrelation profiling. Cell, 125(1):187–199, Apr 2006.

L Gatto and KS Lilley. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization,processing and quantitation. Bioinformatics, 28(2):288–9, Jan 2012.

L Gatto, JA Vizcaino, H Hermjakob, W Huber, and KS Lilley. Organelle proteomics experimental designs and analysis.Proteomics, 2010.

L Gatto, L M Breckels, S Wieczorek, T Burger, and K S Lilley. Mass-spectrometry based spatial proteomics data analysisusing pRoloc and pRolocdata. Bioinformatics, Jan 2014.

TR Kau, JC Way, and PA Silver. Nuclear transport and cancer: from mechanism to intervention. Nat Rev Cancer, 4(2):106–17, Feb 2004.

K Laurila and M Vihinen. Prediction of disease-related mutations affecting protein localization. BMC Genomics, 10:122,2009.

P Wu and TG Dietterich. Improving svm accuracy by training on auxiliary data sources. In Proceedings of the Twenty-firstInternational Conference on Machine Learning, ICML ’04, New York, NY, USA, 2004. ACM.