a.Protein Docking Benchmark helps in the improved performance potential compared
with reduced and detailed scoring functions. Furthermore, we show that the new
data performs well on antibody-antigen
complexes, with most predictions clustering around the Complementarity
Determining Regions of antibodies without any manual intervention
b. They are showing that
method in which all direct and indirect interactions are first weighted using
topological weight (FS-Weight), which estimates the strength of functional
association. Interactions with low weight are removed from the network, while
level-2 interactions with high weight are introduced into the interaction
network. Existing clustering algorithms can then be applied to this modified
network. We have also proposed a novel algorithm that searches for cliques in
the modified network, and merge cliques to form clusters using a “partial
clique merging” method. Experiments show that (1) our complex-finding
algorithm performs very well on interacted networks modified. Only original
Protein interaction network is used, our approach would be most useful for
protein complex prediction, especially for prediction of novel protein
c. MCODE, is one of the first computational methods (and therefore, influential)
which is used to identifying the complex PPI networks. The MCODE algorithm works
in two steps, one is being the vertex weighting and secondly prediction of
complex networks, and an optional third stage for post-processing.
paper has proposed a technic of filtering the PPIN that uses the structural
interface data of protein pairs for the prediction of complex network and in meanwhile
, A simultaneous protein interaction network (SPIN) is introduced to stipulate
mutually exclusive interactions (MEIs) from the intersecting interfaces and to eliminate
competition from MEIs that arise through the detection of protein complexes.
After the processing the SPINs, naive clustering algorithm is used to the SPINs
for protein complex predictions. The results showed that the proposed method beats
the simple PPIN-based method in removing of false positive proteins in the
formation of complexes. This shows that discounting the competition between
MEIs can be in effect for developing prediction accuracy in general
computational approaches involving protein interactions.
e In this article, we propose
a novel unsupervised approach, without relying on the knowledge of existing
complexes. Our method probabilistically calculates the affinity between two
proteins, where the affinity score is evaluated by a co-complexed score or C2S
in brief. In particular, our method uses the log-likelihood ratio of two
proteins being co-complexed to being drawn randomly, and we then determine
protein complexes by using hierarchical clustering algorithm on the C2S score
on these insights, our method (MCL-CA), which couples core-attachment based
refinement steps to refine the clusters produced by MCL also evaluated the
effectiveness of our method on two different datasets and matched the quality
of our regulated complexes with that formed by MCL. The results show that our
approach significantly improves the accuracies of predicted complexes when
matched with known complexes. The result of the is the MCL-CA is able to cover
huge number of known complexes than MCL. And, also compared this method with
two very recently proposed methods CORE and COACH, which also highlights on the
core-attachment structure and also discusses several instances to show that our
predicted complexes clearly adhere to the core-attachment structure.
g. This method introduce a combinatorial approach for prediction of protein
complexes focusing not only on determining member proteins in complexes but
also on the PPI organization of the complexes. Our method analyses complex
candidates predicted by the existing methods. It searches for optimal
combinations of domain-domain interactions in the candidates based on an
assumption that the proteins in a candidate can form a true protein complex if
each of the domains is used by a single protein interaction. This optimization
problem was mathematically formulated and solved using binary integer linear
programming. By using publicly available sets of yeast protein-protein
interactions and domain-domain interactions, and succeeded in extracting
protein complex candidates with an accuracy that is twice the average accuracy
of the existing methods, MCL, MCODE, or clustering coefficient. Although the
configuring parameters for each algorithm resulted in slightly improved
precisions, our method always showed better precision for most values of the
h. This paper proposes a
more appropriate protein complex prediction method, CFA that is based on
connectivity number on sub graphs. We evaluate results of CFA using several
protein networks on reference protein complexes in two benchmark data sets
(MIPS and Aloy), containing 1142 and 61 known complexes respectively. We
compare CFA to some existing protein complex prediction methods (CMC, MCL, PCP
and RNSC) in terms of recall and precision. The CFA predicts more complexes correctly at a
competitive level of precision.
i. In this algorithm,
the fast algorithm is for filtering docked conformations with valuable surface
as complementarity, and defining them based on the performance on their clustering
properties. The available energy filters selects composite protein structure with
lowest desolvation and electrostatic energies. Clustering helps in smoothing
out the local minima and to select the ones with the broadest energy wells—a
property associated with the free energy at the binding site. The strength of
the method was tested on sets of 2000 docked conformations generated for 48
pairs of interacting proteins
molecular complex detection algorithm (MCODE) works in three stages: vertex
weighting, complex prediction and an optional post?processing step. The weight nodes is based
on the core clustering coefficient ,the use of this coefficient instead of the
standard clustering coefficient, as the size increases the weights of heavily
interconnected graph regions while giving small weights to the less connected
vertices, which are abundant in the scale?free protein interaction networks. After computing the weights the
algorithm traverses through the weighted graph in a greedy fashion to
disconnect densely connected regions. The post?processing step refine and adds proteins
based on connectivity rules. MCODE
method used widely in mapping of large?scale interaction networks. It is
available as a plug?in for
the Cytoscape network visualization software.
the protein complex interaction in the network raised an algorithm called RRW,
which repeatedly enlarges a current cluster of proteins according to the
stationary vector of a random walk with restarts with the cluster whose
proteins are equally weighted. In the cluster expansion, all the proteins
within the cluster have equal influences on determination of newly added protein
to the cluster. Here we extend the RRW algorithm by introduction of a random
walk with restarts with a cluster of proteins, each of which is weighted by the
sum of the strengths of the proof for the direct physical interactions
involving the protein. The resulting algorithm is called NWE (Node-Weighted
Expansion of clusters of proteins). Those interactive data in the networks are
obtained from the WI-PHI database.
l. Molecular Complex
Detection (MCODE) MCODE uses an agglomerative method that
works in the following stages: protein (vertex) weighting, complex extraction
and an optional post-processing of complexes.
(MCL) is a fast, highly scalable graph clustering method. Cluster protein
sequences MCL has proved effective for clustering large PPI networks due to its
scalability. MCL works by simulating random walks (called a flow)to extract
dense regions from the network. To simulate the flow, MCL iteratively
manipulates the adjacency matrix of the network using two operators, expansion
and inflation, that control the spread and thickness of the flow, respectively.
L2.Clustering based on merging
Maximal Cliques (CMC) – CMC works by repeated merging of
maximal cliques extracted from the PPI network. CMC includes reliability scores
for PPIs and improves on earlier clique-merging methods, with C Finder Local
Clique Merging Algorithm (LCMA) that work only on unscored networks. CMC begins
by incorporating all maximal cliques in the Protein interaction network using
the fast search-space pruning-based Cliques algorithm. Cliques
are ranked in non-increasing order of their weighted densities. CMC then
iteratively combines highly overlapping cliques depending on the extent of
Overlapping Neighborhood Expansion (Cluster ONE) – Cluster ONE works similar to
MCODE, by seeding and greedy neighborhood expansion. Cluster ONE first
identifies seed proteins and greedily expands them into groups V based on a
cohesiveness of measure. At each step, new proteins are included into V until f
(V) does not increase. V is then denoted as a locally cohesive group. Extremely
overlapping clusters are merged to produce candidate complexes. Since this step
allows for overlapping complexes, Cluster ONE enhances the performance of MCODE
m. The method involves
Finding out the Articulation points and bi
connected components, by using Backtracking and branch-and-bound algorithms.
n. In this method we predict PPI
network and understanding its function involves many methods and method is as
n1. Ensemble clustering includes
multiple algorithm such as (MCL,
CMC, Cluster ONE and HACO) using major voting based scoring. By incorporating
complementary information with the analysis of PPI overcome noise in the data
which will help in prediction of PPI complex structure.
N2. Methods based on network clustering merged
with biological insights.
CORE, CACHET and COACH MCL-caw look for clusters that adhere to the core-attachment organization, noted originally in yeast
complexes. Large-scale pull-down of yeast complexes using TAP-MS in revealed
that proteins within complexes
are organized as two distinct sets.
incorporating functional information
Proteins are generally formed by same or similar function through
functional annotations for the protein available with the topology of PPI
networks will help in improving complex prediction and use RNSC, Protein Complex
Prediction (PCP) and Dense neighborhood Extraction using Connectivity make use of functional
annotations from Gene Ontology to predict complexes.
1 Functional similarity weight
D( u ,v)=?Nu ? N v?? Nu ? N v?+? Nu ? Nv ?,
The CD-distance between two
proteins u and v is given by
b. Functional similarity weighted
The ?2 statistics of function j for protein i
is computed by
2 a. Majority-We consider all neighboring proteins and sum up the
number of times each annotation occurs for each protein as described in
b. Neighborhood- For each
protein, the score of a particular function is given by the value of the
functional flow algorithm simplifies the principle of ‘guilt by association’ to
cluster of proteins that may or may not interact with each other physically.
3 a. protein-protein
interaction data using a graphical method called a functional linkage graph in
which an edge (link) between two nodes (proteins) represents that they might share the same function.
b. PROPAGATION OF PROBABILITIES
Since the label probability of a protein depends on its neighbors which
depend in turn on their neighbors, we would like a rigorous method of
increasing our estimate of the label probability of a protein if our estimate
of its unlabeled neighbors’ label probability increases. The Markov random
field inference problem that corresponds to this is the computation of the
marginal label probabilities of the unlabeled (hidden) nodes given some labelled
4 All the
known proteins can be classified into one of the two categories according to
their function. Therefore, an interaction between two known proteins can be
classified into one of the three groups: (1, 1), (1, 0) and (0, 0)
Z(?) is called the partition function in the general theory of MRF.
5. a. global optimization principle: a score or energy is associated to any
given assignment (configuration) of functions for the whole set of unclassified
proteins. The score is lower in configurations that maximize the presence of
the same functional annotation in interacting proteins.
b. self-consistent test-concerns the presence of errors in the topology
of the protein network. It is known that protein- protein interactions data
obtained from two hybrid experiments contain an amount of false positives and
negatives and these could in principle alter sensibly the quality of
predictions by providing a spurious connectivity to the network
c. tolerance error checking
6 Indirect functional association
We are interested in finding out how often we would observe that a
shares function with its level-2 neighbours instead of its level-1
Functional similarity weight. Using the common interacting partners
between two proteins as an estimate of their functional similarity.
Czekanowski-Dice distance (CD-Distance) as a metric for functional
linkage. The CD-distance between two proteins u and v is given by integrating reliability of experimental
As shown in Nabievaet al.(2005), different experimental sources of
deriving protein–protein interaction may have different reliability.
7 The Gibbs Sampler
and M(i)1are the
numbers of interaction partners of protein Pi labelled with 0 and 1 When all the functions of the interaction partners
of a protein are given, it can be used to derive the probability that the protein
has the function, which is the basis of the Gibbs Sampler.
Assume that the parameters
are given. For a given protein Pi, conditional on the functional labelling of
all the other proteins, we can use the conditional probability Pr(Xij Xi; ?)
in Equation to generate samples
to update the functional labelling of protein Pi. Repeating this procedure many times will generate
samples for the functional labelling of all the unannotated proteins.
For a function of interest, first we estimate the probability,º, that a
protein has the function (without the information on interaction network) by
the fraction of all the proteins having that function Secondly, we estimate the
8 Estimating the reliability of a putative protein interaction
data set. The reliability of a set of putative protein interactions is defined
as the fraction of real protein interactions over all the putative protein
interactions. To find the precision of the estimation, we use the
following formula to calculate,
Prediction Using a-priori
Probabilities Our predictions are based on the idea of “guilt by association”, method
is used for the prediction which is done by a-prior probabilities, when a
protein interacts the hypothetical protein X will have known function which may
share same function with the probability controlled by high throughput data
with the relationship between X and its partner.The function F of reliability
score is given by,
14 The protein of interest is
assigned the function with the highest ?2 value among functions of
all n-neighbouring proteins. For each member of the function category,
the ?2 value is calculated using the following formula:
Overview of the prediction
method. The black circle represents a query protein for which function is
predicted and White circles represent proteins.
(A) Assignment of function to a query protein.
This is done based on the functions of neighboring proteins on the map.
(B) Physical interaction data deposited in the
(C) Development of the protein interaction map
by integrating all physical interaction data.
15 Functional similarity weight
The CD-distance between two
proteins u and v is given by
b. Functional similarity weighted
The ?2 statistics of function j for protein i
is computed by
We consider all neighbouring proteins and sum up the number of times
each annotation occurs for each protein as describe in Schwikowsk et at.
(2000). In the case of weighted interaction graphs, we simply extend the method
by taking a weighted sum instead. For each protein, the score of a particular
function is the corresponding sum.
In multiway K -cut, the task is to partition a graph in such a way that
each Of k terminal nodes belongs to a different subset of the partition and so
that the (weighted) number of edges that are ‘cut’ in the process is minimized.
In the more general version of the multiway k-cut problem considered here, the
goal is to assign a unique function to all the unannotated nodes so as to
minimize the sum of the costs of the edges joining nodes .
17 NEIGHBORHOOD FUNCTION
The MRF framework needs the specification of neighbourhood functions
that describe the dependence of the label probability of a node on the labels
of its neighbours. Different types of neighbourhood conditional probability
functions can be used to model different types of local dependency structure.
The following datasets were used in our analysis: Protein–Protein
Interactions: The GRID dataset contained 20 985 distinct interactions
catalogued between13 607 distinct pairs of proteins. 4708 proteins participated
in interactions. 1442 unlabeled, connected proteins were potential labelling
target. 4588 of these are in a single connected component, the second largest
component has 4 proteins.
18 MRF model based function prediction method is detailed elsewhere. Here
we just describe it briefly. Let a function of interest be category 1 and the
rest be category 0. All the known proteins can be classified into one of the
two categories according to their function. Thus, an interaction between two
known proteins can be classified into one of the three groups: (1, 1), (1, 0)
and (0,0).Given a protein physical interaction network (Net1) and a genetic
interaction network (Net2), the belief can be represented by a Gibbs
distribution (Li, 1995) for this function by considering the classification of
all the proteins.