(1) survey 2011

8/2/2019 (1) survey 2011

1/14

Systematic computational prediction of protein interaction networks

This article has been downloaded from IOPscience. Please scroll down to see the full text article.

2011 Phys. Biol. 8 035008

(http://iopscience.iop.org/1478-3975/8/3/035008)

Download details:

IP Address: 194.225.166.111

The article was downloaded on 26/12/2011 at 11:52

Please note that terms and conditions apply.

View the table of contents for this issue, or go to thejournal homepage for more

ome Search Collections Journals About Contact us My IOPscience
http://iopscience.iop.org/page/termshttp://iopscience.iop.org/1478-3975/8/3http://iopscience.iop.org/1478-3975http://iopscience.iop.org/http://iopscience.iop.org/searchhttp://iopscience.iop.org/collectionshttp://iopscience.iop.org/journalshttp://iopscience.iop.org/page/aboutioppublishinghttp://iopscience.iop.org/contacthttp://iopscience.iop.org/myiopsciencehttp://iopscience.iop.org/myiopsciencehttp://iopscience.iop.org/contacthttp://iopscience.iop.org/page/aboutioppublishinghttp://iopscience.iop.org/journalshttp://iopscience.iop.org/collectionshttp://iopscience.iop.org/searchhttp://iopscience.iop.org/http://iopscience.iop.org/1478-3975http://iopscience.iop.org/1478-3975/8/3http://iopscience.iop.org/page/terms

8/2/2019 (1) survey 2011

2/14

IOP PUBLISHING PHYSICAL BIOLOGY

Phys. Biol. 8 (2011) 035008 (13pp) doi:10.1088/1478-3975/8/3/035008

Systematic computational prediction ofprotein interaction networks

J G Lees1, J K Heriche2, I Morilla3, J A Ranea1,3 and C A Orengo1

1 Research Department of Structural & Molecular Biology, University College London, London, UK2 Cell Biology/Biophysics Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1,D-69117 Heidelberg, Germany3 Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras,

University of Malaga, Malaga, Spain

E-mail: [email protected]

Received 2 November 2010

Accepted for publication 9 February 2011Published 13 May 2011

Online at stacks.iop.org/PhysBio/8/035008

Abstract

Determining the network of physical protein associations is an important first step in

developing mechanistic evidence for elucidating biological pathways. Despite rapid advances

in the field of high throughput experiments to determine protein interactions, the majority of

associations remain unknown. Here we describe computational methods for significantly

expanding protein association networks. We describe methods for integrating multiple

independent sources of evidence to obtain higher quality predictions and we compare the

major publicly available resources available for experimentalists to use.

1. Introduction

New technologies in biology have given us the genomes

for thousands of species, including humans. Understanding

how all of these molecular parts assemble into functional

pathways is a major challenge. It has been noted that an

organisms complexity arises in part from the intricate and

dynamic networks of protein associations. While several

resources [15] provide experimental information on protein

associations, the experimental data, although growing rapidly,

are still limited (e.g. perhaps

8/2/2019 (1) survey 2011

3/14

Phys. Biol. 8 (2011) 035008 J G Lees et al

first method made use of sequence information. As detailed

below, there are different ways of using sequences to make

inference about protein associations. The advantage of these

methods is that sequence data have become abundant and

are easily available through public databases in standardized

formats. A second class of methods has followed the

development of high-throughput experiments and functionalannotation databases.

2.1. Genomic context methods

2.1.1. Co-occurrence profiles. Genome context association

prediction algorithms are a family of sequence-based

methods used to predict associations between proteins.

These techniques are based on principles derived from

known evolutionary processes. For example, co-occurrence

(phylogenetic) profiles are genomic context methods based

on the principle that if genes are functionally related, they

will tend to be co-inherited as a unit since the loss of any

one gene would compromise the functioning of the others.Phylogenetic profiling algorithms look for similar patterns of

the presence/absence of genes across species (figure 1(a)). It

is unclear exactly what the predictions correspond to, although

it is generally considered to be a biological process type

association. More functional significance can be assigned

if the patterns are seen over distant evolutionary periods

or if they occur independently in multiple lineages. The

original phylogenetic profiling idea [9] has been developed

in many different ways including more complex logical

rules to associate genes [10], the use of domain instead of

whole protein profiles [11] and through integrating species

phylogenetic information [12, 13]. Gene duplications canlead to spurious predictions with high similarity between

the duplicated genes profiles. Some resources implement

a scoring scheme whereby homologous proteins are down-

weighed in accordance with their level of homology [14]. It is

also important to filter low information content profiles [11].

Phylogenetic profiles have been used successfully in

archaea and bacteria to discover for example novel essential

members of synthetic pathways [15, 16], environmental

adaption factors [17] and thiamine biosynthesis [18].

Interestingly some studies have made use of anti-correlation

in the pattern as a signal to make predictions of functional

association [18]. A domain-based phylogenetic profile

method, phylotuner [11], has recently been developed to

improve performance in eukaryotes where the multigene

families and protein domain rearrangements create challenges

for these types of approaches.

2.1.2. Gene fusion. Evolutionary pressure can produce

fusion of separate but functionally related genes (A and B

figure 1(b)) into a single gene. In their simplest form the

gene fusion prediction methods identify pairs of proteins in

a genome, which are homologous to proteins fused together

in a different genome, and use this as supporting evidence

for the functional association of these individual genes [19].The predicted association type is again unclear but most often

corresponds to a shared biological process or consecutive

steps in a pathway [19]. Gene fusion events detected in

mammals showed a propensity to interact [20]. As with

phylogenetic profiling, analogous domain-based equivalents

exist [21]. These methods identify domains on two distinct

protein sequences in the same genome that are found fused

into a single sequence in a different genome (figure 1(b)).However, because of large and/or promiscuous domain

families (e.g. kinase domains) domain fusion predictors

require a scoring mechanism to prevent a great number of

non-specific predictions. An example of fusion data used in

conjunction with profiles is given by a method developed for

identifying biothiol synthetic enzymes [22].

2.1.3. Genomic neighborhood. Due to functional constraints

genes can be maintained close together on a chromosome

over long evolutionary time periods (figure 1(d)). Genomic

neighborhood prediction methods identify genes that cluster

within a certain base distance across multiple genomes. Aswith other genomic context methods artifacts can arise through

shared ancestry due to inadequate time for reshuffling of

genes. The genomic neighborhood method is not to be

confused with methods such as the operon method [23]

based on the intergenic distance in a single genome. A

recent comprehensive study in prokaryotes demonstrated the

genomic neighborhood method to be the best among genomic

context methods [24].

Some genomic neighborhood methods assert the rule that

gene order needs to be maintained. However, since some gene

rearrangement can be tolerated not all methods enforce this

constraint [25]. Some resources allow neighboring genes that

are diametrically opposed in a head-to-head orientation to beconsidered [14]. The genomic neighborhood approach has

been used to predict the archael exosome [26] subsequently

experimentally validated [27].

2.2. Sequence-based prediction

2.2.1. Sequence co-evolution. Interactions between proteins

are mediated through specific residue interfaces [28, 29]. It

has been observed that physically interacting proteins have

greater similarity of their phylogenetic trees than expected

by chance. One process put forward used to explain this

is compensatory mutations, whereby a deleterious mutationof an interaction-mediating residue in one protein can be

ameliorated by a compensatory mutation of the binding partner

[30]. More recent analyses suggest that additional sources

contribute to the co-evolutionary signal (see [31, 32]) making

the type of functional linkages derived from these methods

more fuzzy. The prediction methods developed around the

principle of sequence co-evolution make use of multiple

sequence alignments for each putative interacting protein from

which distance matrices are calculated. High correlation of

these distance matrices is taken as evidence of a potential

physical interaction [33] (figure 1(c)). Other methods exploit

gene evolutionary information in multiple sequence analysis

by comparing pairs of gene families phylogenetic treesdistance matrices. These methods have been improved by

2

8/2/2019 (1) survey 2011

4/14


(a)

(b)

(c)

(d)

(e)

Figure 1. Illustration of the principles behind several of the protein association prediction methods described in the text.

integrating species tree information [34] or by implementingalgorithms that efficiently deal with the multi-gene families of

paralogues found in eukaryotes [35].

2.2.2. Commonly occurring domain pairs. A largeportion of interactions are mediated through domaindomain

associations and these interactions are conserved acrossspecies [36] with certain domain pairs re-occurring inmultiple protein interactions [37, 38]. One use for this

domain association network is that the knowledge of theunderlying domains mediating protein interactions can be used

to help predict novel protein interactions. These methodsare potentially powerful since domains can be reliably and

quickly assigned quickly to any genome using the powerfulHMMER3 [39, 40]. Several approaches for predicting

domain interactions have been developed including over-representation methods [41] and random forest methods [42].

Multiple methods are available from the DIMA resource [43].UniDomInt merges the predictions of nine different domaininteraction prediction methods to provide a meta-database of

more reliable associations [44]. High scoring domain pairsbetween proteins have been frequently used to make prediction

as part of an integration strategy (for example [45]).

2.2.3. Simple sequence features. A variety of predictionmethods have been developed to predict proteinprotein

3

8/2/2019 (1) survey 2011

5/14


interactions from intrinsic features of sequences such as 3-

mers of neighboring residues [46]. The high accuracyachieved

by these methods has been recently called into question and

may be an artifact from the sets used to train and validate the

methods [47].

2.3. Homology-based methods

2.3.1. Inheriting protein interactions from sequence.

There are many known protein interactions (interologs)

that are conserved across species [48], although the

general level of protein interaction conservation remains

unclear [49]. Algorithms have been developed that inherit

such protein interactions from an experimentally confirmed

interaction in a genome to the query genome, via homology

[48, 50, 51]. At its simplest these methods make orthology

assignments (by reciprocal best hit or using an orthology

database)and transfer interactions wherepossible (figure1(e)).

More complex methods make use of heterogeneous features,

including domain combination, subcellular localization andtissue specificity to try and give increased confidence to

the interolog assignment [52]. Interolog-based methods are

especially powerful for organisms with few experimentally

determined interactions allowing for a substantial number of

high confidence protein interaction predictions. Interolog-

based methods are one of the most successfully applied protein

association prediction methods in terms of uptake and use by

experimentalists with over 80 publications associated with the

I2D [53] method alone.

2.3.2. Inheriting protein interactions from structure.

Structural complexes can also be used to transfer interactionsfrom a known interacting pair to proteins with similar structure.

These methods can provide insight into the physical details

of the interaction and are likely to become more important

in the near future as more structures become available [54].

Although the number of complexes of a known 3D structure is

relatively small, it is possible to expand this set by considering

homologous proteins. An early example of this is the

InterPreTS method [55] that given a known 3D complex

structure and homologous sequences for each interacting

protein, ranks interactions between homologues of the same

species. Another method Struct2Net [56] threads sequences

to structures and computes scores from the interfacial energyfor the sequence pair. The iWRAP method allows for

inheriting interactions in cases with low sequence identity by

focusing on the interface residues in the threading process

[57]. Recently methods have been developed that add an extra

layer by considering the evolutionary conservation of binding

site residues via structural alignments as providing more

useful information and evidence for the predicted interaction

[58, 59]. The IBIS server predicts interaction partners

and binding sites for a given protein using experimentally

observed or homology inferred complexes. IBIS checks for

several features to ensure biological relevance of the inferred

complex. For example binding site residues are assessed

for evolutionary conservation, using a set of non-redundanthomologous proteins. As another check IBIS makes use of

PISA [60] validation, which considers the physicochemical

properties of the protein interaction interface. Structural

data have been used as a means for inheriting interactions

on a genome-wide scale using structural alignments score to

generate kernels for use with a support vector machine (SVM)

[61].

2.4. Exploiting experimental data

2.4.1. Microarray profiles. Microarrays were one of

the first genome-wide experimental methods developed

[8]. Compendia of microarray experiments across various

experimental conditions have been assembled. It has been

demonstrated in yeast that genes with high co-expression,

defined by Pearson correlation across the different conditions,

were more likely to be physically interacting than randomly

chosen pairs [62] although this signal strength varies for

different organisms [63]. Large information rich data sets

with 6000 microarray experiments have been assembled

[64]. The development of statistical processing tools to findsignals from such large data sets, using a subset of conditions,

has broadened the applicability of the method to predicting

co-complex membership in homo sapiens [65]. There are

other approaches for using microarray data in the context of

an integration strategy for protein association prediction. One

method is to look for genes whose co-expression is conserved

across multiple genomes [66]. Another is to identify genes

expressed in similar subcellular/tissue types[45] as supporting

evidence of their interaction. Clearly such pieces of evidence

are very weak on their own, but can provide useful supporting

information when used in combination with other data.

2.4.2. Other experimental screens. Other types of

experimental screens can also be assembled and processed

into similarity profiles. For example, phenotypic vectors

from high-throughput loss of function experiments can be

clustered to give sets of functionally related proteins often

used as a basis to test for physical interactions. Similarities

between trajectories of subcellular localization have also been

used to generate hypotheses about physical interactions of

uncharacterized proteins [67]. Other examples such as in

vivo genomic binding maps provide information on positional

targeting of chromatin components that canbe used to generate

predictions on the network of interactions in chromatin

assembly [68]. As more high-throughput experimental datasets become available their general usefulness for prediction

will increase.

2.5. Literature-derived associations

2.5.1. Text mining. Only a portion of the experimental

interactions are captured by the interaction database resources

[15, 69, 70]. Information on other experimentally detected

interactions is available from Pubmed and other online

resources [7173]. Text mining is a very powerful method

for expanding interactomes either automatically [14] or for

speeding up the curation process for certain interaction

databases [2, 3]. Protein associations can be obtained bysearching for statistically significant co-occurrences between

4

8/2/2019 (1) survey 2011

6/14


gene names [74]. In its simplest form the principle behind

such methods is that the higher the frequency two genes

occur in the same sentence/paragraph/abstract or article the

more likely their functional association. Another common

way to generate networks is by natural language processing

of abstracts, considering gene names as nodes and the verbs

as edges. Restricting verbs of association to those such asbinds, interacts, etc provides physical interaction networks.

A major uncertainty associated with text mining results is in

assigning the gene names in the text to a corresponding entity

in the sequence databases. In more recent developments,

protein interactions have been extracted from the literature

using kernel methods [75].

2.5.2. Functional semantic similarity. The Gene Ontology

(GO) [76] is a controlled vocabulary used to describe various

attributes of genes including their functions. Terms that

describe the functions are stored as nodes in a directed

graph with specific terms sharing more general terms asparents. For example, apoptotic chromosome condensation

and mitotic chromosome condensation both share the parent

term chromosome condensation. Primary annotations are

derived from the literature through manual curation efforts.

There are several evidences associated with the annotations

ranging from the mostly reliable manual annotations to

automated electronic annotations. Various methods have

sought to derive networks of functional associations between

proteins using their associated GO terms (see [77] for recent

review). Problems in using the GO graph directly arise from

issues such as the variation of term specificity in the graph. A

common solution is to make use of information content-basedmeasures such as the Resnik score [78]. The final choice

of evidence to use similarity measures and gene transference

methods needs to be done on a case-by-case basis [77].

3. Integrating prediction methods

Each source data set, whether experimental or not, has bias

and errors. It is however unlikely given the potential number

of interactions (provided appropriate confidence cut-offs are

used) that two independent prediction methods will give rise to

the same false positive prediction. In general, we could expect

the prediction power (accuracy and coverage) to increase

proportionally to the number of independent approaches

supporting the association. The simplest approaches exploit

this principle by using a joint observation approach for

combining prediction methods, where a greater number of

independent methods predicting the association correspond to

a higher prediction accuracy [23]. Other tests have shown that

integrating multiple predictions using more advanced methods

can improve the prediction power [14, 79, 80] by combining

and reinforcing observations. A wide variety of integration

methods are available including: Fishers, Bayesian, logistic

regression and kernel methods. Some methods provide

confidence estimates for the outputs (e.g. Bayesian and logisticregression) that may be useful in certain scenarios.

3.1. Simple integration

Each of the protein association prediction methods described

above yields scores which correlate with the likelihood offunctional association. However, it can be difficult to directly

combine these scores since the scores for each method can

differ both in scale and in predicted biological association

type. To help overcome this problem, output scores fromindividual prediction methods need to be transformed intoconfidence measures using a set of known true positives. The

Prolinks predictionpipeline [81] simply chooses the maximum

score from all the individual methods as the choice of geneassociation score. Other methods make use of a formula for

combining the scores from each method, optionally after aweighting of each methods general performance [82].

3.2. Bayesian integration

Bayesian integration is the most widely used strategy for

integrating protein association predictions [14, 80]. It hasseveral features that make it suitable for data integration of

this type (table 1). Each individual data channel is implicitlyweighted according to its reliability, and hence it is easy

to interpret the probability relationships for each channel.

Crucially this method can accommodate missing data whichtypically lead to problems for supervised learning methods.

Nave Bayesian integration presumes that the different datachannels are statistically independent of one another and

failure to remove or merge redundant data sources can lead to

over-prediction. Bayesian integrators have been used in manysuccessful applications [83] allowing for multiple types of data

to be integrated (including numerical and categorical). It may

not always be advantageous to add in increasing numbers ofdata sources. One study has shown that choosing a small

number of the best features from those available can improveperformance, and adding in additional input data types does

not give further improvements [84].

3.3. Fishers method

Fisher is one of the general non-Bayesian methods, which

has been successfully used to integrate protein interactionprediction from diverse methods. Fishers algorithm is a

solution (as the Pareto optimal solution) to the problem of

combining independent tests [85]. The method is highly

flexible and it is able to deal with the low overlap betweensource data sets. Some recent studies have successfullyapplied Fishers method to protein association prediction

[86]. Fishers method does not need trained or supervised

predictions based on experimental gold standard data sets ofprotein interactions. Hence, if only genomic context methods

are used, Fishers predictions can be considered independentof the public repositories of protein interactions. A weighted

version of Fisher provides the ability to optimize contributions

from each data source.

3.4. Kernel methods

As these methods have gained in popularity in recent years, wegive here a brief summary of kernel properties that make them

5

8/2/2019 (1) survey 2011

7/14


Table 1. Methods commonly used for integrating multiple protein association prediction methods. A 1 or 0 denotes the presence or absenceof a particular desirable property.

Integration method/example reference

Nave Bayes Fishers SVM Graph kernel + Random forestAdvantageous property [80, 90] [85, 91] [92, 93] SVM [46, 88] [94, 95]

Copes well with missing values 1 1 0 0 1Importance of input features can be readily obtained 1 0 0 0 1Copes well with high-dimensional data 0 0 1 1 1Complex relationships between input variables can be learned 0 0 1 1 1Probability estimate readily obtained from output 1 0 0 0 0No parameter optimization required 0 1 0 0 0No requirement for independence between input data 0 0 1 1 1No training data required 0 1 0 0 0

attractive for data integration. For more details, we refer the

interested reader to [87]. By definition, a kernel is a function

that gives the dot product between two vectors in some multi-

dimensional space (called feature space). A kernel matrix

(often abbreviated as kernel) contains the evaluation of thekernel function for all pairs of data points under consideration.

A kernel can be viewed as a matrix of similarities between

data points and different kernels capture different notions of

similarity as they correspond to embedding thedata in different

feature spaces.

The first property of interest is that any symmetric matrix

with non-negative eigenvalues is a valid kernel matrix. This

means we can test whether a similarity matrix is a valid kernel

without knowing the feature space in which the kernel function

operates. This makes kernel methods applicable, not only to

real-valued vectors, but to any data (e.g. sequences, graphs) for

which we candefinea similarity measure. Thesecond property

of interest for data integration is that various mathematical

combinations of kernels (e.g. linear combination) produce a

valid kernel. So far, most data integration approaches using

kernels for predicting protein interactions have been used

in a classification framework with support vector machines

[46, 88]. This leads to the requirement of a negative

data set which can be problematic to generate (section 5.1).

Alternatively, kernels can be used for link prediction in a semi-

supervised setting [89].

3.5. Random forest classifiers

Decision trees are supervised classification algorithms usingtree-like graphs for making predictions in a supervised

framework. In its simplest form a decision tree makes multiple

binary tests in a tree structure such that a given input vector

of attributes is propagated through the tree using the internal

nodes to test an attributes value and terminal nodes to give a

classification. Random forests are ensemble classifiers made

up of many individual decision trees [96]. Random forests

provide an efficient means of increasing performance, and are

less prone to over-fitting than individual decision trees. Each

individual decision tree is generated by selecting a random

subset of the training data with replacement. The final output

of the RF classifier is from the majority vote of its individual

decision trees. The random forest classifier has been shownto be consistently amongst the best methods on a wide range

of protein association tasks including protein interaction and

co-complex membership [92, 94, 97]. Unlike most supervised

learning methods with random forests it is possible to obtain

a measure of importance for each input channel to the overall

performance [97]. They can also deal with large and sparseinput vectors as seen for their use in predicting interactions

from protein domain content [42]. Random forests have also

been used to integrate structural information with more typical

protein association data types [56].

3.6. Logistic regression

Logistic regression has been used to integrate data to provide

output predictions [92, 9799] but has been shown to

be outperformed by random forests [92, 97]. Another

approach found good performance for logistic regression after

subdividing the input data into different natural groupings[99].

3.7. Random walks on a graph

Some authors have used an approach using matrices derived

from random walks on graphs to prioritize genes. A random

walk on a graph describes the sequence of steps taken by

a walker who moves from one node to a randomly selected

adjacent node with a probability proportional to the weight

associated with the edge connecting the two nodes. A random

walk is a type of Markov chain from which different measures

of similarity between nodes of the graph can be computed. If

the Markov chain is regular it gives rise to a valid kernel [100].

Random walks have been used in two ways for data integration.

In the first, each data set is considered separately as a graph

from which a random walk-based similarity is derived and

used to rank the genes. A rank aggregation method is then

used for the data integration step. Although this approach has

been essentially used to predict disease genes [101, 102] it

could be applicable to proteinprotein interaction prediction

or at least to the prediction of functional relationships. The

second integration approach consists in merging the different

source data sets into one graph from which a random walk-

based measure of similarity is derived and used for ranking

genes. Again, this has been used to identify disease genes[103] but could be used for other types of predictions.

6

8/2/2019 (1) survey 2011

8/14


Figure 2. Examples of networks generated before (left) spectral clustering, green balls represent baits in the Mitocheck experiments [67]and after (right network) spectral clustering (colors represent different complexes/clusters).

4. Exploiting the network structure

The prediction methods described above produce pairwise

protein interaction data sets that can be used to construct

proteinprotein interaction graphs (also called protein

networks in systems biology) which are natural data structures

to model relationships between proteins. Several methods

use the experimentally determined protein interaction network

graph structure itself as the primary data source to infer

complex membership. The underlying assumption for these

methods is that proteins in a complex are more denselyconnected to each other than to the rest of the graph. Over

the years various clustering algorithms have been applied

to this problem (for a review see [104]). Most of these

are heuristics and come with sometimes hard to tune free

parameters. The Markov clustering algorithm (MCL) appears

to be one of the best methods currently available for clustering

protein interaction graphs [105, 106]. The MCL simulates a

random walk on the graph and iteratively prunes the weaker

edges. An exact analysis of a random walk on a graph

leads to spectral clustering algorithms which have recently

been applied to the protein complex prediction problem

(figure 2) [67, 107, 108] although an early application ofspectral clustering for complex detection was described in

[109]. Prior to these methods, spectral decomposition of

matrices derived from the interaction graph had also been

used to find complexes [110, 111]. Although the nature of the

structure and properties of an entire proteome graph remains

controversial [112], topological properties have been used to

guide protein interaction predictions [113].

5. Benchmarking

5.1. Gold standards

In order to validate the prediction methods described above, agold standard reference set is required. Known 3D structures

formally provide direct evidence of physical interaction

although care needs to be taken to determine the biological

unit and ignore irrelevant crystal contacts. The resource

for this was initially the protein quaternary structure (PQS)

resource [114] although this has been replaced by PISA [60].

For evaluating physical interactions in yeast, the most widely

used gold standard data set was initially the curated MIPS

protein interaction data set from yeast [115]. However, this

data set was later shown to be highly unrepresentative, with

over half the interactions coming from ribosomal proteins

[116] producing a highly unrepresentative data set. Recentlymore up-to-date gold standard databases have been generated

[5, 117]. Also despite misgivings about the quality of Yeast-

2-Hybrid (Y2H) data sets, work has shown that commonly

used Y2H data sets are of similar quality to other experimentalinteraction data and even curated data sets [7]. Such Y2H

data sets can be processed further to give higher quality data

sets potentially suitable for benchmarking [118]. Certain

integration methods such as SVMs additionally require a

negative gold standard data set (i.e. a set of proteins known

not to interact). A common approach for generating negative

data sets is to select random pairs of proteins from the genome.

However, this is not an optimal solution and can lead to variousproblems such as the prediction method learning the pattern of

missing values causing over-prediction of associations [95].

Also unless care is taken the negative data set network can

have a different structure to the positive data set leading

to overestimates of performance for certain algorithms [47].

Recently carefully curated true negative data sets have been

assembled from the literature [119]. They may help with the

over-prediction problem in the future although they are likely

to contain biases (e.g. toward well-studied proteins). Other

tools are available, providing negatives based on functional

dissimilarity, subcellular location, non-interacting domain

pairs [120, 121] and shortest path lengths [122].

With regard to validation of functional associations manydifferent resourceshave been used including KEGG [123],GO

7

8/2/2019 (1) survey 2011

9/14


[76] and Panther [124]. KEGG annotation can be considered

as high quality resource with 1500 genomes annotated.

KEGG marks up some organisms as manually curated such

as human, and others as automatically annotated from the

curated genomes. STRING [14] benchmarks its predictors

using KEGG to provide an interpretable output score. One

advantage of using GO is that it is possible to make use ofthe ontology to define semantic similarities between proteins;

thus, all pairs of proteins within a certain similarity threshold

can be considered within the benchmark.

5.2. Data set bias

Many of the prediction methods are faced with the problem of

bias in the available data. For example, supervised methods

are hampered by the lack of true negative data sets. More

subtly, biological research is mostly focused on disease-

related and well-characterized genes. As a consequence, a

small number of genes and their products contribute a lot of

(possibly irrelevant [125, 126]) information while for most ofthe genome little is available. Genome-wide experiments (for

example [127]) should help alleviate this problem. Several

large-scale Y2H data sets are available (for example [128]),

although these are not devoid of experimental biases of their

own. For example, classic Y2H requires translocation of

proteins involved in the interaction with the nucleus and does

not perform well in all cases including membrane-associated

proteins and transient interactions [129].

5.3. The importance of independent benchmarks

A major problem in benchmarking protein association

prediction methods is the presence of circularity between thedata used as source input to the methods and the testing set.

This circularity can be quite subtle and papers do not always

take sufficient care to eliminate this issue. For example,

once knowledge enters one realm (e.g. protein interaction

databases) it can be quickly integrated into a secondary data

set (e.g. Reactome). Even the genomic context methods and

microarray data sets are now partly incorporated into the GO.

This problem goes further than affecting the benchmarking

since the lack of an independent test set precludes the

ability to accurately optimize prediction methods, leading

to over-fitting. Although it is possible to improve the

benchmarking independence through careful filtering of datasets, the only safe option is to do experimental validation of the

predictions. However, this is expensive and often only allows

a small number of targets to be validated with low statistical

significance (for example [80]). An alternative is to implement

a rollback benchmarkwheresource-training data sets are rolled

back to a given date and the test data are from after this date.

In practice this approach suffers from social bias in that

biologists are not testing the predictions but interactions with

well-characterized, disease-related genes. Also circularity

is still not completely removed by a rollback benchmark

since todays text mining associations and interologs are a

source of tomorrows curated database entries and protein

interaction experiments, respectively. In the future a CASPstyle benchmark would be a good first step in providing

real performance measures for the many prediction methods

available.

5.4. Real world performance measure

The expected number of interactions found in an organism

[130] is much smaller than the total number of possible

interactions, where true positives (TPs) are found very

infrequently relative to false positives (FPs). As an example

let us say for an organism TPs constitute only 0.1% of all

possible protein pairs, then a predictor with a reported 1% false

discovery rate, on a balanced test set of TPs and TNs, would

still produce ten false predictions for every TP in its real world

application. The imbalance of TPs to true negatives (TNs)

should be considered an important factor when considering the

usefulness of a prediction andthe size andtype of thevalidation

screen required to get a useful number of TP experimental

validations.

6. Existing resources

6.1. Online resources

A quick survey of resources hosting interaction data

and predicted interaction data is quite daunting (e.g.

http://ppi.fli-leibniz.de/jcb_ppi_databases.html). The most

widely used of these is STRING which combines information

from multiple sources and includes predictions from genomic

context (gene neighborhood, domain fusion, phylogenetic

profiles), high-throughput experiments (co-expression) and

previous knowledge (text mining, known protein interactions).

The majority of the associations in STRING come from its text

mining and inherited interactions [14]. STRING v8.3 provides

information for2.5 million sequences in 630 organisms with

regular updates. Another regularly updated resource with

easy to use interface is the GeneMania resource which has

both known and predicted protein associations. An alternative

integration strategy is used by the online resource FuncNet

(http://funcnet.eu/) which uses theweighted Fishers approach

and integrates, online, eight independent prediction methods

with different geographical locations throughout Europe.

Many prediction methods exist that have shown to be

powerful enough for experimentalists to use as part of their

standard experimental screens (e.g. table 2). Despite this

even for well-studied organisms such as human there are largeportions of the interactome missing. As an example of the

utility of the integration methods above, we have constructed

a network using only those genes with no known physical

interactions (after merging eight public databases). Even

with these very poorly characterized Ensembl genes we were

able to construct substantial networks (figure 3). Extreme

examples such as this suggest that much could be gained

from experimentalists sampling more of the genome using

established prediction methods as a guide.

6.2. Context specific resources

Experiments are most usually designed to focus on a specificpathway or biological process. Resources such as STRING

8
http://ppi.fli-leibniz.de/jcb_ppi_databases.htmlhttp://funcnet.euprotect%20%24elax%20hbox%20%7Bma%20char%20%2775%7D%24/http://funcnet.euprotect%20%24elax%20hbox%20%7Bma%20char%20%2775%7D%24/http://funcnet.euprotect%20%24elax%20hbox%20%7Bma%20char%20%2775%7D%24/http://ppi.fli-leibniz.de/jcb_ppi_databases.html

8/2/2019 (1) survey 2011

10/14


Figure 3. Example networks predicted from FuncNet CODA, FuncNet Hippo and STRING (score filtered at 500) with the database channelremoved. The network has been filtered to remove any genes with a known database physical interactions from one of Intact, MINT, MIPS,STRING, BIOGRID, DIP, HPRD and Reactome. Example subnetworks predominantly made up of phylogenetic profile, CODA or textmining associations (from left to right) are shown.

Table 2. Example online protein association prediction resources.

Online resource URL/reference Comments

IBIS http://www.ncbi.nlm.nih.gov/Structure/ibis/[59] Predicts interactions and binding residuesFuncNet http://funcnet.eu/[86] Integrates eight data sources using FishersPPI E.Coli http://sunserver.cdfd.org.in:8080/protease/PPI/[93] Example of an SVM-based integration resourceBCI http://amdec-bioinfo.cu genome.org/html/BCellInteractome.html [131] Cell-type specific predictionsI2D http://ophid.utoronto.ca/ophidv2.201/index.jsp [53] Interologs for expanding protein interaction

networksGeneMania http://genemania.org/search.jsf[132] High coverage of available known associationsSTRING http://string-db.org/[14] Largest number of genomes covered

provide the union of many available protein interactions.

However, for reasons such as differential expression, anygiven cell will only express a subset of all protein interactionsfound in an organism. In view of this, certain resourceshave been developed that apply contextual information to giveinteractomes specific for a cell type. One example of such aresource is the B-cell interactome [133], which predicts B-cellspecific protein associations. Tailoring of the data in the B-cell interactome, to help ensure B-cell specific interactions, isachieved by filtering to only include those proteins expressedin B-cells, and to include B-cell relevant microarray data setsas inputs to the Bayesian integrator. These B-cell specificnetworks have been used to extend our knowledge of B-cellbiology [131]. Other resources available (POINTILLIST [85])can be readily tailored with data sources specific to the systemof interest [91].

6.3. PPI prediction pipelines

Many large-scale experimental projects have been carried out

[128, 134138]. Such projects are costly and time consuming

and a strategy for effective protein pair prioritization is

desirable. A recent study [45] trialing various approaches

for this task showed that a protein interaction prediction

method using a nave Bayes integration of several of the

methods described in this section (expression data, GO,

interologs, domain interactions) gave the largest improvement

in efficiency. Even though this method had a high false

discovery rate (92%) there were still large reductions in cost

(>50 fold at 50% coverage) in comparison to not using the

predicted protein interactions.

9
http://www.ncbi.nlm.nih.gov/Structure/ibis/http://funcnet.eu/http://sunserver.cdfd.org.in:8080/protease/PPI/http://amdec-bioinfo.cu%20genome.org/html/BCellInteractome.htmlhttp://ophid.utoronto.ca/ophidv2.201/index.jsphttp://genemania.org/search.jsfhttp://string-db.org/http://string-db.org/http://genemania.org/search.jsfhttp://ophid.utoronto.ca/ophidv2.201/index.jsphttp://amdec-bioinfo.cu%20genome.org/html/BCellInteractome.htmlhttp://sunserver.cdfd.org.in:8080/protease/PPI/http://funcnet.eu/http://www.ncbi.nlm.nih.gov/Structure/ibis/

8/2/2019 (1) survey 2011

11/14


7. Conclusion

The genomic context methods provide a fascinating field of

study, at the juncture of evolutionary theory and modern

computational biology. Despite the relatively short time these

methods have been available they have proven to be very

useful in guiding experiments. There is great potential forgreater uptake of these methods by experimentalists. Over

the coming years we can expect to see improvements in

the prediction methods particularly genomic context methods

which will benefit from targeted genome sequencing efforts

such as the GEBA project [139]. Such projects are expected

to provide improved sampling, fill in major phylogenetic

gaps and provide wider evolutionary distances. There is a

growing list of examples in the literature where they have been

used successfully when combined by statistical integration

methods. An example is the application of the FuncNet

protocol to human mitotic spindle proteins in the ENFIN [140]

network for systems biology, which combined prediction data

using Fisher integration, showed an increase in prediction

accuracy from 35% to 76%. Given the many prediction

methods available it is likely that greater coordination between

computational groups will lead to reduced redundancy,

improved resources and ultimately greater usage of protein

interaction predictions by experimentalists.

Acknowledgments

This work was funded in part by the European Commission

via the Sixth Framework Program Network of Excellence

ENFIN (contract number LSHG-CT-2005-518254). JGL andJKH acknowledge funding from ENFIN. JAR acknowledges

funding from SAF2009-09839 andthe Ramon y Cajal program

(RYC-2007-01649; Ministerio de Ciencia e Innovacion,

Spain). CIBERER is an initiative of the ISCIII.

References

[1] Kerrien S et al 2007 IntActopen source resource formolecular interaction data Nucleic Acids Res.35 D5615

[2] Chatr-aryamontri A, Ceol A, Palazzi L M, Nardelli G,Schneider M V, Castagnoli L and Cesareni G 2007 MINT:

the Molecular INTeraction database Nucleic Acids Res.35 D5724[3] Xenarios I, Rice D W, Salwinski L, Baron M K,

Marcotte E M and Eisenberg D 2000 DIP: the database ofinteracting proteins Nucleic Acids Res. 28 28991

[4] Keshava Prasad T S et al 2009 Human protein referencedatabase2009 update Nucleic Acids Res. 37 D76772

[5] Ruepp A et al 2008 CORUM: the comprehensive resource ofmammalian protein complexes Nucleic Acids Res.36 D64650

[6] Stumpf M P, Thorne T, de Silva E, Stewart R, An H J,Lappe M and Wiuf C 2008 Estimating the size of thehuman interactome Proc. Natl Acad. Sci. USA 105 695964

[7] Venkatesan K et al 2009 An empirical framework for binaryinteractome mapping Nat. Methods 6 8390

[8] Suthram S, Sittler T and Ideker T 2005 The Plasmodiumprotein network diverges from those of other eukaryotesNature 438 10812

[9] Pellegrini M, Marcotte E M, Thompson M J, Eisenberg Dand Yeates T O 1999 Assigning protein functions bycomparative genome analysis: protein phylogeneticprofiles Proc. Natl Acad. Sci. USA 96 42858

[10] Bowers P M, Cokus S J, Eisenberg D and Yeates T O 2004Use of logic relationships to decipher protein networkorganization Science 306 22469

[11] Ranea J A, Yeats C, Grant A and Orengo C A 2007Predicting protein function with hierarchical phylogeneticprofiles: the Gene3D phylo-tuner method applied toeukaryotic genomes PLoS Comput. Biol. 3 e237

[12] Barker D and Pagel M 2005 Predicting functional gene linksfrom phylogenetic-statistical analyses of whole genomesPLoS Comput. Biol. 1 e3

[13] Zhou Y, Wang R, Li L, Xia X and Sun Z 2006 Inferringfunctional linkages between proteins from evolutionaryscenarios J. Mol. Biol. 359 11509

[14] Jensen L J et al 2009 STRING 8a global view on proteinsand their functional interactions in 630 organisms NucleicAcids Res. 37 D4126

[15] Luttgen H et al 2000 Biosynthesis of terpenoids: YchBprotein of Escherichia coli phosphorylates the 2-hydroxy

group of 4-diphosphocytidyl-2 C-methyl-D-erythritolProc. Natl. Acad. Sci. USA 97 10627

[16] Carlson B A, Xu X M, Kryukov G V, Rao M, Berry M J,Gladyshev V N and Hatfield D L 2004 Identification andcharacterization of phosphoseryl-tRNA[Ser]Sec kinaseProc. Natl Acad. Sci. USA 101 1284853

[17] Forterre P 2002 A hot story from comparative genomics:reverse gyrase is the only hyperthermophile-specificprotein Trends Genet. 18 2367

[18] Morett E, Korbel J O, Rajan E, Saab-Rincon G, Olvera L,Olvera M, Schmidt S, Snel B and Bork P 2003 Systematicdiscovery of analogous enzymes in thiamin biosynthesisNat. Biotechnol. 21 7905

[19] Marcotte E M, Pellegrini M, Ng H L, Rice D W, Yeates T Oand Eisenberg D 1999 Detecting protein function and

proteinprotein interactions from genome sequencesScience 285 7513

[20] Zhang Z et al 2006 Genome-wide analysis of mammalianDNA segment fusion/fission J. Theor. Biol. 240 2008

[21] Reid A J, Ranea J A, Clegg A B and Orengo C A 2010CODA: accurate detection of functional associationsbetween proteins in eukaryotic genomes using domainfusion PLoS ONE5 e10908

[22] Gaballa A, Newton G L, Antelmann H, Parsonage D,Upton H, Rawat M, Claiborne A, Fahey R C andHelmann J D 2010 Biosynthesis and functions ofbacillithiol, a major low-molecular-weight thiol in BacilliProc. Natl Acad. Sci. USA 107 64826

[23] Strong M, Mallick P, Pellegrini M, Thompson M Jand Eisenberg D 2003 Inference of protein function andprotein linkages in Mycobacterium tuberculosis based onprokaryotic genome organization: a combinedcomputational approach Genome Biol. 4 R59

[24] Ferrer L, Dale J M and Karp P D 2010 A systematic study ofgenome context methods: calibration, normalization andcombination BMC Bioinformatics 11 493

[25] Itoh T, Takemoto K, Mori H and Gojobori T 1999Evolutionary instability of operon structures disclosed bysequence comparisons of complete microbial genomesMol. Biol. Evol. 16 33246

[26] Koonin E V, Wolf Y I and Aravind L 2001 Prediction of thearchaeal exosome and its connections with the proteasomeand the translation and transcription machineries by acomparative-genomic approach Genome Res. 11 24052

[27] Evguenieva-Hackenberg E, Walter P, Hochleitner E,Lottspeich F and Klug G 2003 An exosome-like complexin Sulfolobus solfataricus EMBO Rep. 4 88993

10
http://dx.doi.org/10.1093/nar/gkl958http://dx.doi.org/10.1093/nar/gkl958http://dx.doi.org/10.1093/nar/gkl950http://dx.doi.org/10.1093/nar/gkl950http://dx.doi.org/10.1093/nar/28.1.289http://dx.doi.org/10.1093/nar/28.1.289http://dx.doi.org/10.1093/nar/gkn892http://dx.doi.org/10.1093/nar/gkn892http://dx.doi.org/10.1093/nar/gkm936http://dx.doi.org/10.1093/nar/gkm936http://dx.doi.org/10.1073/pnas.0708078105http://dx.doi.org/10.1073/pnas.0708078105http://dx.doi.org/10.1038/nmeth.1280http://dx.doi.org/10.1038/nmeth.1280http://dx.doi.org/10.1038/nature04135http://dx.doi.org/10.1038/nature04135http://dx.doi.org/10.1073/pnas.96.8.4285http://dx.doi.org/10.1073/pnas.96.8.4285http://dx.doi.org/10.1126/science.1103330http://dx.doi.org/10.1126/science.1103330http://dx.doi.org/10.1371/journal.pcbi.0030237http://dx.doi.org/10.1371/journal.pcbi.0030237http://dx.doi.org/10.1371/journal.pcbi.0010003http://dx.doi.org/10.1371/journal.pcbi.0010003http://dx.doi.org/10.1016/j.jmb.2006.04.011http://dx.doi.org/10.1016/j.jmb.2006.04.011http://dx.doi.org/10.1093/nar/gkn760http://dx.doi.org/10.1093/nar/gkn760http://dx.doi.org/10.1073/pnas.97.3.1062http://dx.doi.org/10.1073/pnas.97.3.1062http://dx.doi.org/10.1073/pnas.0402636101http://dx.doi.org/10.1073/pnas.0402636101http://dx.doi.org/10.1016/S0168-9525(02)02650-1http://dx.doi.org/10.1016/S0168-9525(02)02650-1http://dx.doi.org/10.1038/nbt834http://dx.doi.org/10.1038/nbt834http://dx.doi.org/10.1126/science.285.5428.751http://dx.doi.org/10.1126/science.285.5428.751http://dx.doi.org/10.1016/j.jtbi.2005.09.016http://dx.doi.org/10.1016/j.jtbi.2005.09.016http://dx.doi.org/10.1371/journal.pone.0010908http://dx.doi.org/10.1371/journal.pone.0010908http://dx.doi.org/10.1073/pnas.1000928107http://dx.doi.org/10.1073/pnas.1000928107http://dx.doi.org/10.1186/gb-2003-4-9-r59http://dx.doi.org/10.1186/gb-2003-4-9-r59http://dx.doi.org/10.1186/1471-2105-11-493http://dx.doi.org/10.1186/1471-2105-11-493http://dx.doi.org/10.1101/gr.162001http://dx.doi.org/10.1101/gr.162001http://dx.doi.org/10.1038/sj.embor.embor929http://dx.doi.org/10.1038/sj.embor.embor929http://dx.doi.org/10.1038/sj.embor.embor929http://dx.doi.org/10.1101/gr.162001http://dx.doi.org/10.1186/1471-2105-11-493http://dx.doi.org/10.1186/gb-2003-4-9-r59http://dx.doi.org/10.1073/pnas.1000928107http://dx.doi.org/10.1371/journal.pone.0010908http://dx.doi.org/10.1016/j.jtbi.2005.09.016http://dx.doi.org/10.1126/science.285.5428.751http://dx.doi.org/10.1038/nbt834http://dx.doi.org/10.1016/S0168-9525(02)02650-1http://dx.doi.org/10.1073/pnas.0402636101http://dx.doi.org/10.1073/pnas.97.3.1062http://dx.doi.org/10.1093/nar/gkn760http://dx.doi.org/10.1016/j.jmb.2006.04.011http://dx.doi.org/10.1371/journal.pcbi.0010003http://dx.doi.org/10.1371/journal.pcbi.0030237http://dx.doi.org/10.1126/science.1103330http://dx.doi.org/10.1073/pnas.96.8.4285http://dx.doi.org/10.1038/nature04135http://dx.doi.org/10.1038/nmeth.1280http://dx.doi.org/10.1073/pnas.0708078105http://dx.doi.org/10.1093/nar/gkm936http://dx.doi.org/10.1093/nar/gkn892http://dx.doi.org/10.1093/nar/28.1.289http://dx.doi.org/10.1093/nar/gkl950http://dx.doi.org/10.1093/nar/gkl958

8/2/2019 (1) survey 2011

12/14


[28] Tuncbag N, Gursoy A, Guney E, Nussinov R and Keskin O2008 Architectures and functional coverage ofproteinprotein interfaces J. Mol. Biol. 381 785802

[29] Tuncbag N, Kar G, Keskin O, Gursoy A and Nussinov R2009 A survey of available tools and web servers foranalysis of proteinprotein interactions and interfacesBrief Bioinform 10 21732

[30] Pazos F, Helmer-Citterich M, Ausiello G and Valencia A1997 Correlated mutations contain information aboutproteinprotein interaction J. Mol. Biol. 271 51123

[31] Juan D, Pazos F and Valencia A 2008 Co-evolution andco-adaptation in protein networks FEBS Lett. 582 122530

[32] Kann M G, Shoemaker B A, Panchenko A R andPrzytycka T M 2009 Correlated evolution of interactingproteins: looking behind the mirrortree J. Mol. Biol.385 918

[33] Pazos F and Valencia A 2001 Similarity of phylogenetic treesas indicator of proteinprotein interaction Protein Eng.14 60914

[34] Pazos F, Ranea J A, Juan D and Sternberg M J 2005Assessing protein co-evolution in the context of the tree oflife assists in the prediction of the interactome J. Mol. Biol.

352 100215[35] Izarzugaza J M, Juan D, Pons C, Ranea J A, Valencia Aand Pazos F 2006 TSEMA: interactive prediction ofprotein pairings between interacting families NucleicAcids Res. 34 W3159

[36] Itzhaki Z, Akiva E, Altuvia Y and Margalit H 2006Evolutionary conservation of domaindomain interactionsGenome Biol. 7 R125

[37] Finn R D et al 2008 The Pfam protein families databaseNucleic Acids Res. 36 D2818

[38] Stein A, Panjkovich A and Aloy P 2009 3did Update:domaindomain and peptide-mediated interactions ofknown 3D structure Nucleic Acids Res. 37 D3004

[39] Eddy S R 2009 A new generation of homology search toolsbased on probabilistic inference Genome Inform 23 20511

[40] Lees J, Yeats C, Redfern O, Clegg A and Orengo C 2010Gene3D: merging structure and function for a thousandgenomes Nucleic Acids Res. 38 D296300

[41] Kim W K, Park J and Suh J K 2002 Large scale statisticalprediction of proteinprotein interaction by potentiallyinteracting domain (PID) pair Genome Inform 13 4250

[42] Chen X W and Liu M 2005 Prediction of proteinproteininteractions using random decision forest frameworkBioinformatics 21 4394400

[43] Luo Q, Pagel P, Vilne B and Frishman D 2011 DIMA 3.0:domain interaction map Nucleic Acids Res. 39 D7249

[44] Bjorkholm P and Sonnhammer E L 2009 Comparativeanalysis and unification of domaindomain interactionnetworks Bioinformatics 25 30205

[45] Schwartz A S, Yu J, Gardenour K R, Finley R L Jr andIdeker T 2009 Cost-effective strategies for completing theinteractome Nat. Methods 6 5561

[46] Ben-Hur A and Noble W S 2005 Kernel methods forpredicting proteinprotein interactions Bioinformatics21 (Suppl. 1) i3846

[47] Yu J, Guo M, Needham C J, Huang Y, Cai L andWesthead D R 2010 Simple sequence-based kernels do notpredict proteinprotein interactions Bioinformatics26 26104

[48] Matthews L R, Vaglio P, Reboul J, Ge H, Davis B P, GarrelsJ, Vincent S and Vidal M 2001 Identification of potentialinteraction networks using sequence-based searches forconserved proteinprotein interactions or interologsGenome Res. 11 21206

[49] Mika S and Rost B 2006 Proteinprotein interactions moreconserved within species than across species PLoSComput. Biol. 2 e79

[50] Persico M, Ceol A, Gavrila C, Hoffmann R, Florio Aand Cesareni G 2005 HomoMINT: an inferred humannetwork based on orthology mapping of proteininteractions discovered in model organisms BMCBioinformatics 6 (Suppl. 4) S21

[51] Kemmer D et al 2005 Ulyssesan application for theprojection of molecular interactions across speciesGenome Biol. 6 R106

[52] Huang T W, Lin C Y and Kao C Y 2007 Reconstruction ofhuman protein interolog network using evolutionaryconserved networkBMC Bioinformatics 8 152

[53] Brown K R and Jurisica I 2007 Unequal evolutionaryconservation of human protein interactions in interologousnetworks Genome Biol. 8 R95

[54] Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia Aand Tress M L 2009 Progress and challenges in predictingproteinprotein interaction sites Brief Bioinform10 23346

[55] Aloy P and Russell R B 2003 InterPreTS: protein interactionprediction through tertiary structure Bioinformatics19 1612

[56] Singh R, Park D, Xu J, Hosur R and Berger B 2010

Struct2Net: a web service to predict proteinproteininteractions using a structure-based approach NucleicAcids Res. 38 (Suppl.) W50815

[57] Hosur R, Xu J, Bienkowska J and Berger B 2011 iWRAP: aninterface threading approach with application to predictionof cancer-related proteinprotein interactions J. Mol. Biol.405 1295310

[58] Zhang Q C, Petrey D, Norel R and Honig B H 2010 Proteininterface conservation across structure space Proc. NatlAcad. Sci. USA 107 10896901

[59] Shoemaker B A, Zhang D, Thangudu R R, Tyagi M,Fong J H, Marchler-Bauer A, Bryant S H, Madej Tand Panchenko A R 2010 Inferred biomolecularinteraction servera web server to analyze and predictprotein interacting partners and binding sites Nucleic

Acids Res. 38 D51824[60] Krissinel E and Henrick K 2007 Inference of macromolecular

assemblies from crystalline state J. Mol. Biol.372 77497

[61] Hue M, Riffle M, Vert J P and Noble W S 2010 Large-scaleprediction of proteinprotein interactions from structuresBMC Bioinformatics 11 144

[62] Grigoriev A 2001 A relationship between gene expressionand protein interactions on the proteome scale: analysis ofthe bacteriophage T7 and the yeast Saccharomycescerevisiae Nucleic Acids Res. 29 35139

[63] Bhardwaj N and Lu H 2005 Correlation between geneexpression profiles and proteinprotein interactions withinand across genomes Bioinformatics 21 27308

[64] Lukk M, Kapushesky M, Nikkila J, Parkinson H,Goncalves A, Huber W, Ukkonen E and Brazma A 2010 Aglobal map of human gene expression Nat. Biotechnol.28 3224

[65] Adler P, Kolde R, Kull M, Tkachenko A, Peterson H,Reimand J and Vilo J 2009 Mining for coexpression acrosshundreds of datasets using novel rank aggregation andvisualization methods Genome Biol. 10 R139

[66] Stuart J M, Segal E, Koller D and Kim S K 2003 Agene-coexpression network for global discovery ofconserved genetic modules Science 302 24955

[67] Hutchins J R et al 2010 Systematic analysis of human proteincomplexes identifies chromosome segregation proteinsScience 328 5939

[68] van Steensel B, Braunschweig U, Filion G J, Chen M,

van Bemmel J G and Ideker T 2010 Bayesian networkanalysis of targeting interactions in chromatin GenomeRes. 20 190200

11
http://dx.doi.org/10.1016/j.jmb.2008.04.071http://dx.doi.org/10.1016/j.jmb.2008.04.071http://dx.doi.org/10.1093/bib/bbp001http://dx.doi.org/10.1093/bib/bbp001http://dx.doi.org/10.1006/jmbi.1997.1198http://dx.doi.org/10.1006/jmbi.1997.1198http://dx.doi.org/10.1016/j.febslet.2008.02.017http://dx.doi.org/10.1016/j.febslet.2008.02.017http://dx.doi.org/10.1016/j.jmb.2008.09.078http://dx.doi.org/10.1016/j.jmb.2008.09.078http://dx.doi.org/10.1093/protein/14.9.609http://dx.doi.org/10.1093/protein/14.9.609http://dx.doi.org/10.1016/j.jmb.2005.07.005http://dx.doi.org/10.1016/j.jmb.2005.07.005http://dx.doi.org/10.1093/nar/gkl112http://dx.doi.org/10.1093/nar/gkl112http://dx.doi.org/10.1186/gb-2006-7-12-r125http://dx.doi.org/10.1186/gb-2006-7-12-r125http://dx.doi.org/10.1093/nar/gkm960http://dx.doi.org/10.1093/nar/gkm960http://dx.doi.org/10.1093/nar/gkn690http://dx.doi.org/10.1093/nar/gkn690http://dx.doi.org/10.1142/9781848165632_0019http://dx.doi.org/10.1142/9781848165632_0019http://dx.doi.org/10.1093/nar/gkp987http://dx.doi.org/10.1093/nar/gkp987http://dx.doi.org/10.1093/bioinformatics/bti721http://dx.doi.org/10.1093/bioinformatics/bti721http://dx.doi.org/10.1093/nar/gkq1200http://dx.doi.org/10.1093/nar/gkq1200http://dx.doi.org/10.1093/bioinformatics/btp522http://dx.doi.org/10.1093/bioinformatics/btp522http://dx.doi.org/10.1038/nmeth.1283http://dx.doi.org/10.1038/nmeth.1283http://dx.doi.org/10.1093/bioinformatics/bti1016http://dx.doi.org/10.1093/bioinformatics/bti1016http://dx.doi.org/10.1093/bioinformatics/btq483http://dx.doi.org/10.1093/bioinformatics/btq483http://dx.doi.org/10.1101/gr.205301http://dx.doi.org/10.1101/gr.205301http://dx.doi.org/10.1371/journal.pcbi.0020079http://dx.doi.org/10.1371/journal.pcbi.0020079http://dx.doi.org/10.1186/1471-2105-6-S4-S21http://dx.doi.org/10.1186/1471-2105-6-S4-S21http://dx.doi.org/10.1186/gb-2005-6-12-r106http://dx.doi.org/10.1186/gb-2005-6-12-r106http://dx.doi.org/10.1186/1471-2105-8-152http://dx.doi.org/10.1186/1471-2105-8-152http://dx.doi.org/10.1186/gb-2007-8-5-r95http://dx.doi.org/10.1186/gb-2007-8-5-r95http://dx.doi.org/10.1093/bib/bbp021http://dx.doi.org/10.1093/bib/bbp021http://dx.doi.org/10.1093/bioinformatics/19.1.161http://dx.doi.org/10.1093/bioinformatics/19.1.161http://dx.doi.org/10.1093/nar/gkq481http://dx.doi.org/10.1093/nar/gkq481http://dx.doi.org/10.1016/j.jmb.2010.11.025http://dx.doi.org/10.1016/j.jmb.2010.11.025http://dx.doi.org/10.1073/pnas.1005894107http://dx.doi.org/10.1073/pnas.1005894107http://dx.doi.org/10.1093/nar/gkp842http://dx.doi.org/10.1093/nar/gkp842http://dx.doi.org/10.1016/j.jmb.2007.05.022http://dx.doi.org/10.1016/j.jmb.2007.05.022http://dx.doi.org/10.1186/1471-2105-11-144http://dx.doi.org/10.1186/1471-2105-11-144http://dx.doi.org/10.1093/nar/29.17.3513http://dx.doi.org/10.1093/nar/29.17.3513http://dx.doi.org/10.1093/bioinformatics/bti398http://dx.doi.org/10.1093/bioinformatics/bti398http://dx.doi.org/10.1038/nbt0410-322http://dx.doi.org/10.1038/nbt0410-322http://dx.doi.org/10.1186/gb-2009-10-12-r139http://dx.doi.org/10.1186/gb-2009-10-12-r139http://dx.doi.org/10.1126/science.1087447http://dx.doi.org/10.1126/science.1087447http://dx.doi.org/10.1126/science.1181348http://dx.doi.org/10.1126/science.1181348http://dx.doi.org/10.1101/gr.098822.109http://dx.doi.org/10.1101/gr.098822.109http://dx.doi.org/10.1101/gr.098822.109http://dx.doi.org/10.1126/science.1181348http://dx.doi.org/10.1126/science.1087447http://dx.doi.org/10.1186/gb-2009-10-12-r139http://dx.doi.org/10.1038/nbt0410-322http://dx.doi.org/10.1093/bioinformatics/bti398http://dx.doi.org/10.1093/nar/29.17.3513http://dx.doi.org/10.1186/1471-2105-11-144http://dx.doi.org/10.1016/j.jmb.2007.05.022http://dx.doi.org/10.1093/nar/gkp842http://dx.doi.org/10.1073/pnas.1005894107http://dx.doi.org/10.1016/j.jmb.2010.11.025http://dx.doi.org/10.1093/nar/gkq481http://dx.doi.org/10.1093/bioinformatics/19.1.161http://dx.doi.org/10.1093/bib/bbp021http://dx.doi.org/10.1186/gb-2007-8-5-r95http://dx.doi.org/10.1186/1471-2105-8-152http://dx.doi.org/10.1186/gb-2005-6-12-r106http://dx.doi.org/10.1186/1471-2105-6-S4-S21http://dx.doi.org/10.1371/journal.pcbi.0020079http://dx.doi.org/10.1101/gr.205301http://dx.doi.org/10.1093/bioinformatics/btq483http://dx.doi.org/10.1093/bioinformatics/bti1016http://dx.doi.org/10.1038/nmeth.1283http://dx.doi.org/10.1093/bioinformatics/btp522http://dx.doi.org/10.1093/nar/gkq1200http://dx.doi.org/10.1093/bioinformatics/bti721http://dx.doi.org/10.1093/nar/gkp987http://dx.doi.org/10.1142/9781848165632_0019http://dx.doi.org/10.1093/nar/gkn690http://dx.doi.org/10.1093/nar/gkm960http://dx.doi.org/10.1186/gb-2006-7-12-r125http://dx.doi.org/10.1093/nar/gkl112http://dx.doi.org/10.1016/j.jmb.2005.07.005http://dx.doi.org/10.1093/protein/14.9.609http://dx.doi.org/10.1016/j.jmb.2008.09.078http://dx.doi.org/10.1016/j.febslet.2008.02.017http://dx.doi.org/10.1006/jmbi.1997.1198http://dx.doi.org/10.1093/bib/bbp001http://dx.doi.org/10.1016/j.jmb.2008.04.071

8/2/2019 (1) survey 2011

13/14


[69] Schaefer C F, Anthony K, Krupa S, Buchoff J, Day M,Hannay T and Buetow K H 2009 PID: the pathwayinteraction database Nucleic Acids Res. 37 D6749

[70] Matthews L et al 2009 Reactome knowledgebase of humanbiological pathways and processes Nucleic Acids Res.37 D61922

[71] Cherry J M et al 1998 SGD: saccharomyces genome databaseNucleic Acids Res. 26 739

[72] Amberger J, Bocchini C A, Scott A F and Hamosh A 2009McKusicks online Mendelian inheritance in man (OMIM)Nucleic Acids Res. 37 D7936

[73] Tweedie S et al 2009 FlyBase: enhancing Drosophila GeneOntology annotations Nucleic Acids Res. 37 D5559

[74] Blaschke C, Hoffmann R, Oliveros J C and Valencia A 2001Extracting information automatically from biologicalliterature Comp. Funct. Genomics 2 3103

[75] Tikk D, Thomas P, Palaga P, Hakenberg J and Leser U 2010A comprehensive benchmark of kernel methods to extractproteinprotein interactions from literature PLoS Comput.Biol. 6 e1000837

[76] Ashburner M et al 2000 Gene ontology: tool for theunification of biology. The Gene Ontology Consortium

Nat. Genet. 25 259[77] Pesquita C, Faria D, Falcao A O, Lord P and Couto F M 2009

Semantic similarity in biomedical ontologies PLoSComput. Biol. 5 e1000443

[78] Lord P W, Stevens R D, Brass A and Goble C A 2003Investigating semantic similarity measures across the GeneOntology: the relationship between sequence andannotation Bioinformatics 19 127583

[79] Scott M S and Barton G J 2007 Probabilistic prediction andranking of human proteinprotein interactions BMCBioinformatics 8 239

[80] Jansen R et al 2003 A Bayesian networks approach forpredicting proteinprotein interactions from genomic dataScience 302 44953

[81] Bowers P M, Pellegrini M, Thompson M J, Fierro J,

Yeates T O and Eisenberg D 2004 Prolinks: a database ofprotein functional linkages derived from coevolutionGenome Biol. 5 R35

[82] Sun J, Sun Y, Ding G, Liu Q, Wang C, He Y, Shi T, Li Yand Zhao Z 2007 InPrePPI: an integrated evaluationmethod based on genomic context for predictingproteinprotein interactions in prokaryotic genomes BMCBioinformatics 8 414

[83] Wilkinson D J 2007 Bayesian methods in bioinformatics andcomputational systems biology Brief Bioinform8 10916

[84] Lu L J, Xia Y, Paccanaro A, Yu H and Gerstein M 2005Assessing the limits of genomic data integration forpredicting protein networks Genome Res. 15 94553

[85] Hwang D et al 2005 A data integration methodology forsystems biology Proc. Natl Acad. Sci. USA102 17296301

[86] Ranea J A, Morilla I, Lees J G, Reid A J, Yeats C, Clegg A B,Sanchez-Jimenez F and Orengo C 2010 Finding the darkmatter in human and yeast protein network prediction andmodelling PLoS Comput. Biol. 6 e1000945

[87] Shawe-Taylor J and Cristianini N (eds) 2004 Kernel Methodsfor Pattern Analysis (Cambridge: Cambridge UniversityPress)

[88] Qiu J and Noble W S 2008 Predicting co-complexed proteinpairs from heterogeneous data PLoS Comput. Biol.4 e1000054

[89] Zhou D and Scholkopf B 2004 A regularization frameworkfor learning from graph data ICML Workshop on

Statistical Relational Learning[90] Xia K, Dong D and Han J D 2006 IntNetDB v1.0: anintegrated proteinprotein interaction network database

generated by a probabilistic model BMC Bioinformatics7 508

[91] Hwang D et al 2005 A data integration methodology forsystems biology: experimental verification Proc. NatlAcad. Sci. USA 102 173027

[92] Qi Y, Bar-Joseph Z and Klein-Seetharaman J 2006Evaluation of different biological data and computational

classification methods for use in protein interactionprediction Proteins 63 490500[93] Yellaboina S, Goyal K and Mande S C 2007 Inferring

genome-wide functional linkages in E. coli by combiningimproved genome context methods: comparison withhigh-throughput experimental data Genome Res.17 52735

[94] Qi Y, Klein-Seetharaman J and Bar-Joseph Z 2005 Randomforest similarity for proteinprotein interaction predictionfrom multiple sources Pac. Symp. Biocomput. 10 53142

[95] Mohamed T P, Carbonell J G and Ganapathiraju M K 2010Active learning for human proteinprotein interactionprediction BMC Bioinformatics 11 (Suppl. 1) S57

[96] Geurts P, Irrthum A and Wehenkel L 2009 Supervisedlearning with decision tree-based methods in

computational and systems biology Mol. Biosyst.5 1593605

[97] Lin N, Wu B, Jansen R, Gerstein M and Zhao H 2004Information assessment on predicting proteinproteininteractions BMC Bioinformatics 5 154

[98] Sprinzak E, Altuvia Y and Margalit H 2006 Characterizationand prediction of proteinprotein interactions within andbetween complexes Proc. Natl Acad. Sci. USA103 1471823

[99] Qi Y, Klein-Seetharaman J and Bar-Joseph Z 2007 A mixtureof feature experts approach for proteinprotein interactionprediction BMC Bioinformatics 8 (Suppl. 10) S6

[100] Fouss F, Francoisse K, Yen L, Pirotte A and Saerens M 2006An experimental investigation of graph kernels on acollaborative recommendation taskProc. 6th Int. Conf. on

Data Mining pp 8638[101] Kohler S, Bauer S, Horn D and Robinson P N 2008 Walking

the interactome for prioritization of candidate diseasegenes Am. J. Human Genet. 82 94958

[102] Li Y and Patra J C 2010 Integration of multiple data sourcesto prioritize candidate genes using discounted ratingsystem BMC Bioinformatics 11 (Suppl. 1) S20

[103] Li Y and Patra J C 2010 Genome-wide inferringgene-phenotype relationship by walking on theheterogeneous networkBioinformatics 26 121924

[104] Li X, Wu M, Kwoh C K and Ng S K 2010 Computationalapproaches for detecting protein complexes from proteininteraction networks: a survey BMC Genomics11 (Suppl. 1) S3

[105] Brohee S and van Helden J 2006 Evaluation of clusteringalgorithms for proteinprotein interaction networks BMCBioinformatics 7 488

[106] Vlasblom J and Wodak S J 2009 Markov clustering versusaffinity propagation for the partitioning of proteininteraction graphs BMC Bioinformatics 10 99

[107] Inoue K, Li W and Kurata H 2010 Diffusion model basedspectral clustering for proteinprotein interaction networksPLoS ONE5 e12623

[108] Qin G and Gao L 2010 Spectral clustering for detectingprotein complexes in proteinprotein interaction (PPI)networks Math. Comput. Modell. 52 206674

[109] Ding C, He X, Meraz R F and Holbrook S R 2004 A unifiedrepresentation of multiprotein complex data for modelinginteraction networks Proteins 57 99108

[110] Bu D et al 2003 Topological structure analysis of theproteinprotein interaction network in budding yeastNucleic Acids Res. 31 244350

12
http://dx.doi.org/10.1093/nar/gkn653http://dx.doi.org/10.1093/nar/gkn653http://dx.doi.org/10.1093/nar/gkn863http://dx.doi.org/10.1093/nar/gkn863http://dx.doi.org/10.1093/nar/26.1.73http://dx.doi.org/10.1093/nar/26.1.73http://dx.doi.org/10.1093/nar/gkn665http://dx.doi.org/10.1093/nar/gkn665http://dx.doi.org/10.1093/nar/gkn788http://dx.doi.org/10.1093/nar/gkn788http://dx.doi.org/10.1002/cfg.102http://dx.doi.org/10.1002/cfg.102http://dx.doi.org/10.1371/journal.pcbi.1000837http://dx.doi.org/10.1371/journal.pcbi.1000837http://dx.doi.org/10.1038/75556http://dx.doi.org/10.1038/75556http://dx.doi.org/10.1371/journal.pcbi.1000443http://dx.doi.org/10.1371/journal.pcbi.1000443http://dx.doi.org/10.1093/bioinformatics/btg153http://dx.doi.org/10.1093/bioinformatics/btg153http://dx.doi.org/10.1186/1471-2105-8-239http://dx.doi.org/10.1186/1471-2105-8-239http://dx.doi.org/10.1126/science.1087361http://dx.doi.org/10.1126/science.1087361http://dx.doi.org/10.1186/gb-2004-5-5-r35http://dx.doi.org/10.1186/gb-2004-5-5-r35http://dx.doi.org/10.1186/1471-2105-8-414http://dx.doi.org/10.1186/1471-2105-8-414http://dx.doi.org/10.1093/bib/bbm007http://dx.doi.org/10.1093/bib/bbm007http://dx.doi.org/10.1101/gr.3610305http://dx.doi.org/10.1101/gr.3610305http://dx.doi.org/10.1073/pnas.0508647102http://dx.doi.org/10.1073/pnas.0508647102http://dx.doi.org/10.1371/journal.pcbi.1000945http://dx.doi.org/10.1371/journal.pcbi.1000945http://dx.doi.org/10.1371/journal.pcbi.1000054http://dx.doi.org/10.1371/journal.pcbi.1000054http://dx.doi.org/10.1186/1471-2105-7-508http://dx.doi.org/10.1186/1471-2105-7-508http://dx.doi.org/10.1073/pnas.0508649102http://dx.doi.org/10.1073/pnas.0508649102http://dx.doi.org/10.1002/prot.20865http://dx.doi.org/10.1002/prot.20865http://dx.doi.org/10.1101/gr.5900607http://dx.doi.org/10.1101/gr.5900607http://dx.doi.org/10.1142/9789812702456_0050http://dx.doi.org/10.1142/9789812702456_0050http://dx.doi.org/10.1186/1471-2105-11-S1-S57http://dx.doi.org/10.1186/1471-2105-11-S1-S57http://dx.doi.org/10.1039/b907946ghttp://dx.doi.org/10.1039/b907946ghttp://dx.doi.org/10.1186/1471-2105-5-154http://dx.doi.org/10.1186/1471-2105-5-154http://dx.doi.org/10.1073/pnas.0603352103http://dx.doi.org/10.1073/pnas.0603352103http://dx.doi.org/10.1186/1471-2105-8-S10-S6http://dx.doi.org/10.1186/1471-2105-8-S10-S6http://dx.doi.org/10.1016/j.ajhg.2008.02.013http://dx.doi.org/10.1016/j.ajhg.2008.02.013http://dx.doi.org/10.1186/1471-2105-11-S1-S20http://dx.doi.org/10.1186/1471-2105-11-S1-S20http://dx.doi.org/10.1093/bioinformatics/btq108http://dx.doi.org/10.1093/bioinformatics/btq108http://dx.doi.org/10.1186/1471-2164-11-S1-S3http://dx.doi.org/10.1186/1471-2164-11-S1-S3http://dx.doi.org/10.1186/1471-2105-7-488http://dx.doi.org/10.1186/1471-2105-7-488http://dx.doi.org/10.1186/1471-2105-10-99http://dx.doi.org/10.1186/1471-2105-10-99http://dx.doi.org/10.1371/journal.pone.0012623http://dx.doi.org/10.1371/journal.pone.0012623http://dx.doi.org/10.1016/j.mcm.2010.06.015http://dx.doi.org/10.1016/j.mcm.2010.06.015http://dx.doi.org/10.1002/prot.20147http://dx.doi.org/10.1002/prot.20147http://dx.doi.org/10.1093/nar/gkg340http://dx.doi.org/10.1093/nar/gkg340http://dx.doi.org/10.1093/nar/gkg340http://dx.doi.org/10.1002/prot.20147http://dx.doi.org/10.1016/j.mcm.2010.06.015http://dx.doi.org/10.1371/journal.pone.0012623http://dx.doi.org/10.1186/1471-2105-10-99http://dx.doi.org/10.1186/1471-2105-7-488http://dx.doi.org/10.1186/1471-2164-11-S1-S3http://dx.doi.org/10.1093/bioinformatics/btq108http://dx.doi.org/10.1186/1471-2105-11-S1-S20http://dx.doi.org/10.1016/j.ajhg.2008.02.013http://dx.doi.org/10.1186/1471-2105-8-S10-S6http://dx.doi.org/10.1073/pnas.0603352103http://dx.doi.org/10.1186/1471-2105-5-154http://dx.doi.org/10.1039/b907946ghttp://dx.doi.org/10.1186/1471-2105-11-S1-S57http://dx.doi.org/10.1142/9789812702456_0050http://dx.doi.org/10.1101/gr.5900607http://dx.doi.org/10.1002/prot.20865http://dx.doi.org/10.1073/pnas.0508649102http://dx.doi.org/10.1186/1471-2105-7-508http://dx.doi.org/10.1371/journal.pcbi.1000054http://dx.doi.org/10.1371/journal.pcbi.1000945http://dx.doi.org/10.1073/pnas.0508647102http://dx.doi.org/10.1101/gr.3610305http://dx.doi.org/10.1093/bib/bbm007http://dx.doi.org/10.1186/1471-2105-8-414http://dx.doi.org/10.1186/gb-2004-5-5-r35http://dx.doi.org/10.1126/science.1087361http://dx.doi.org/10.1186/1471-2105-8-239http://dx.doi.org/10.1093/bioinformatics/btg153http://dx.doi.org/10.1371/journal.pcbi.1000443http://dx.doi.org/10.1038/75556http://dx.doi.org/10.1371/journal.pcbi.1000837http://dx.doi.org/10.1002/cfg.102http://dx.doi.org/10.1093/nar/gkn788http://dx.doi.org/10.1093/nar/gkn665http://dx.doi.org/10.1093/nar/26.1.73http://dx.doi.org/10.1093/nar/gkn863http://dx.doi.org/10.1093/nar/gkn653

8/2/2019 (1) survey 2011

14/14


[111] Sen T Z, Kloczkowski A and Jernigan R L 2006 Functionalclustering of yeast proteins from the proteinproteininteraction networkBMC Bioinformatics 7 355

[112] Lima-Mendez G and van Helden J 2009 The powerful law ofthe power law and other myths in network biology Mol.Biosyst. 5 148293

[113] Gomez S M and Rzhetsky A 2002 Towards the prediction ofcomplete proteinprotein interaction networks Pac. Symp.Biocomput. 7 41324

[114] Henrick K and Thornton J M 1998 PQS: a protein quaternarystructure file server Trends Biochem. Sci. 23 35861

[115] Guldener U, Munsterkotter M, Oesterheld M, Pagel P,Ruepp A, Mewes H W and Stumpflen V 2006 MPact: theMIPS protein interaction resource on yeast Nucleic AcidsRes. 34 D43641

[116] Hart G T, Lee I and Marcotte E R 2007 A high-accuracyconsensus map of yeast protein complexes reveals modularnature of gene essentiality BMC Bioinformatics8 236

[117] Pu S, Wong J, Turner B, Cho E and Wodak S J 2009Up-to-date catalogues of yeast protein complexes NucleicAcids Res. 37 82531

[118] Yu Het al

2008 High-quality binary protein interaction mapof the yeast interactome networkScience 322 10410[119] Smialowski P et al 2010 The Negatome database: a reference

set of non-interacting protein pairs Nucleic Acids Res.38 D5404

[120] Browne F, Wang H, Zheng H and Azuaje F 2009 GRIP: aweb-based system for constructing gold standard datasetsfor proteinprotein interaction prediction Source CodeBiol. Med. 4 2

[121] Chen X W, Jeong J C and Dermyer P 2010 KUPS:constructing datasets of interacting and non-interactingprotein pairs with associated attributions Nucleic AcidsRes 39 D7504

[122] Sharan R, Suthram S, Kelley R M, Kuhn T, McCuine S,Uetz P, Sittler T, Karp R M and Ideker T 2005 Conserved

patterns of protein interaction in multiple species ProcNatl. Acad. Sci. USA 102 19749[123] Kanehisa M et al 2008 KEGG for linking genomes to life and

the environment Nucleic Acids Res. 36 D4804[124] Thomas P D, Campbell M J, Kejariwal A, Mi H, Karlak B,

Daverman R, Diemer K, Muruganujan A andNarechania A 2003 PANTHER: a library of proteinfamilies and subfamilies indexed by function Genome Res.13 212941

[125] Ioannidis J P 2007 Why most published research findings arefalse: authors reply to Goodman and Greenland PLoSMed. 4 e215

[126] Pfeiffer T, Rand D G and Dreber A 2009 Decision-making inresearch tasks with sequential testing PLoS ONE4 e4607

[127] Neumann B et al 2010 Phenotypic profiling of the humangenome by time-lapse microscopy reveals cell divisiongenes Nature 464 7217

[128] Uetz P et al 2000 A comprehensive analysis ofproteinprotein interactions in Saccharomyces cerevisiaeNature 403 6237

[129] Russell R B and Aloy P 2008 Targeting and tinkering withinteraction networks Nat. Chem. Biol. 4 66673

[130] Hart G T, Ramani A K and Marcotte E M 2006 Howcomplete are current yeast and human protein-interactionnetworks? Genome Biol. 7 120

[131] Lefebvre C et al 2010 A human B-cell interactome identifiesMYB and FOXM1 as master regulators of proliferation ingerminal centers Mol. Syst. Biol. 6 377

[132] Warde-Farley D et al 2010 The GeneMANIA predictionserver: biological network integration for geneprioritization and predicting gene function Nucleic Acids

Res. 38 W21420[133] Lefebvre C, Lim W K, Basso K, dalla-Favera R andCalifano A 2007 A context-specific network ofproteinDNA and proteinprotein interactions reveals newregulatory motifs in human B cells Lect. NotesBioinformatics (LNCS) 4532 4256

[134] Krogan N J et al 2006 Global landscape of protein complexesin the yeast Saccharomyces cerevisiae Nature440 63743

[135] Gavin A C et al 2006 Proteome survey reveals modularity ofthe yeast cell machinery Nature 440 6316

[136] Li S et al 2004 A map of the interactome network of themetazoan C. elegans Science 303 5403

[137] Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M andSakaki Y 2001 A comprehensive two-hybrid analysis to

explore the yeast protein interactome Proc. Natl Acad. Sci.USA 98 456974[138] Giot L et al 2003 A protein interaction map of Drosophila

melanogaster Science 302 172736[139] Wu D et al 2009 A phylogeny-driven genomic encyclopaedia

of bacteria and archaea Nature 462 105660[140] Kahlem P and Birney E 2007 ENFIN a network to enhance

integrative systems biology Ann. New York Acad. Sci.1115 2331

13
http://dx.doi.org/10.1186/1471-2105-7-355http://dx.doi.org/10.1186/1471-2105-7-355http://dx.doi.org/10.1039/b908681ahttp://dx.doi.org/10.1039/b908681ahttp://dx.doi.org/10.1016/S0968-0004(98)01253-5http://dx.doi.org/10.1016/S0968-0004(98)01253-5http://dx.doi.org/10.1093/nar/gkj003http://dx.doi.org/10.1093/nar/gkj003http://dx.doi.org/10.1186/1471-2105-8-236http://dx.doi.org/10.1186/1471-2105-8-236http://dx.doi.org/10.1093/nar/gkn1005http://dx.doi.org/10.1093/nar/gkn1005http://dx.doi.org/10.1126/science.1158684http://dx.doi.org/10.1126/science.1158684http://dx.doi.org/10.1093/nar/gkp1026http://dx.doi.org/10.1093/nar/gkp1026http://dx.doi.org/10.1186/1751-0473-4-2http://dx.doi.org/10.1186/1751-0473-4-2http://dx.doi.org/10.1093/nar/gkq943http://dx.doi.org/10.1093/nar/gkq943http://dx.doi.org/10.1073/pnas.0409522102http://dx.doi.org/10.1073/pnas.0409522102http://dx.doi.org/10.1093/nar/gkm882http://dx.doi.org/10.1093/nar/gkm882http://dx.doi.org/10.1101/gr.772403http://dx.doi.org/10.1101/gr.772403http://dx.doi.org/10.1371/journal.pmed.0040215http://dx.doi.org/10.1371/journal.pmed.0040215http://dx.doi.org/10.1371/journal.pone.0004607http://dx.doi.org/10.1371/journal.pone.0004607http://dx.doi.org/10.1038/nature08869http://dx.doi.org/10.1038/nature08869http://dx.doi.org/10.1038/35001009http://dx.doi.org/10.1038/35001009http://dx.doi.org/10.1038/nchembio.119http://dx.doi.org/10.1038/nchembio.119http://dx.doi.org/10.1186/gb-2006-7-11-120http://dx.doi.org/10.1186/gb-2006-7-11-120http://dx.doi.org/10.1038/msb.2010.31http://dx.doi.org/10.1038/msb.2010.31http://dx.doi.org/10.1093/nar/gkq537http://dx.doi.org/10.1093/nar/gkq537http://dx.doi.org/10.1038/nature04670http://dx.doi.org/10.1038/nature04670http://dx.doi.org/10.1038/nature04532http://dx.doi.org/10.1038/nature04532http://dx.doi.org/10.1126/science.1091403http://dx.doi.org/10.1126/science.1091403http://dx.doi.org/10.1073/pnas.061034498http://dx.doi.org/10.1073/pnas.061034498http://dx.doi.org/10.1126/science.1090289http://dx.doi.org/10.1126/science.1090289http://dx.doi.org/10.1038/nature08656http://dx.doi.org/10.1038/nature08656http://dx.doi.org/10.1196/annals.1407.016http://dx.doi.org/10.1196/annals.1407.016http://dx.doi.org/10.1196/annals.1407.016http://dx.doi.org/10.1038/nature08656http://dx.doi.org/10.1126/science.1090289http://dx.doi.org/10.1073/pnas.061034498http://dx.doi.org/10.1126/science.1091403http://dx.doi.org/10.1038/nature04532http://dx.doi.org/10.1038/nature04670http://dx.doi.org/10.1093/nar/gkq537http://dx.doi.org/10.1038/msb.2010.31http://dx.doi.org/10.1186/gb-2006-7-11-120http://dx.doi.org/10.1038/nchembio.119http://dx.doi.org/10.1038/35001009http://dx.doi.org/10.1038/nature08869http://dx.doi.org/10.1371/journal.pone.0004607http://dx.doi.org/10.1371/journal.pmed.0040215http://dx.doi.org/10.1101/gr.772403http://dx.doi.org/10.1093/nar/gkm882http://dx.doi.org/10.1073/pnas.0409522102http://dx.doi.org/10.1093/nar/gkq943http://dx.doi.org/10.1186/1751-0473-4-2http://dx.doi.org/10.1093/nar/gkp1026http://dx.doi.org/10.1126/science.1158684http://dx.doi.org/10.1093/nar/gkn1005http://dx.doi.org/10.1186/1471-2105-8-236http://dx.doi.org/10.1093/nar/gkj003http://dx.doi.org/10.1016/S0968-0004(98)01253-5http://dx.doi.org/10.1039/b908681ahttp://dx.doi.org/10.1186/1471-2105-7-355

(1) survey 2011

Documents

Transcript of (1) survey 2011