a.Protein Docking Benchmark helps in the improved performance potential compared

with reduced and detailed scoring functions. Furthermore, we show that the new

data performs well on antibody-antigen

complexes, with most predictions clustering around the Complementarity

Determining Regions of antibodies without any manual intervention

b. They are showing that

method in which all direct and indirect interactions are first weighted using

topological weight (FS-Weight), which estimates the strength of functional

association. Interactions with low weight are removed from the network, while

level-2 interactions with high weight are introduced into the interaction

network. Existing clustering algorithms can then be applied to this modified

network. We have also proposed a novel algorithm that searches for cliques in

the modified network, and merge cliques to form clusters using a “partial

clique merging” method. Experiments show that (1) our complex-finding

algorithm performs very well on interacted networks modified. Only original

Protein interaction network is used, our approach would be most useful for

protein complex prediction, especially for prediction of novel protein

complexes.

c. MCODE, is one of the first computational methods (and therefore, influential)

which is used to identifying the complex PPI networks. The MCODE algorithm works

in two steps, one is being the vertex weighting and secondly prediction of

complex networks, and an optional third stage for post-processing.

d.this

paper has proposed a technic of filtering the PPIN that uses the structural

interface data of protein pairs for the prediction of complex network and in meanwhile

, A simultaneous protein interaction network (SPIN) is introduced to stipulate

mutually exclusive interactions (MEIs) from the intersecting interfaces and to eliminate

competition from MEIs that arise through the detection of protein complexes.

After the processing the SPINs, naive clustering algorithm is used to the SPINs

for protein complex predictions. The results showed that the proposed method beats

the simple PPIN-based method in removing of false positive proteins in the

formation of complexes. This shows that discounting the competition between

MEIs can be in effect for developing prediction accuracy in general

computational approaches involving protein interactions.

e In this article, we propose

a novel unsupervised approach, without relying on the knowledge of existing

complexes. Our method probabilistically calculates the affinity between two

proteins, where the affinity score is evaluated by a co-complexed score or C2S

in brief. In particular, our method uses the log-likelihood ratio of two

proteins being co-complexed to being drawn randomly, and we then determine

protein complexes by using hierarchical clustering algorithm on the C2S score

matrix.

f. Based

on these insights, our method (MCL-CA), which couples core-attachment based

refinement steps to refine the clusters produced by MCL also evaluated the

effectiveness of our method on two different datasets and matched the quality

of our regulated complexes with that formed by MCL. The results show that our

approach significantly improves the accuracies of predicted complexes when

matched with known complexes. The result of the is the MCL-CA is able to cover

huge number of known complexes than MCL. And, also compared this method with

two very recently proposed methods CORE and COACH, which also highlights on the

core-attachment structure and also discusses several instances to show that our

predicted complexes clearly adhere to the core-attachment structure.

g. This method introduce a combinatorial approach for prediction of protein

complexes focusing not only on determining member proteins in complexes but

also on the PPI organization of the complexes. Our method analyses complex

candidates predicted by the existing methods. It searches for optimal

combinations of domain-domain interactions in the candidates based on an

assumption that the proteins in a candidate can form a true protein complex if

each of the domains is used by a single protein interaction. This optimization

problem was mathematically formulated and solved using binary integer linear

programming. By using publicly available sets of yeast protein-protein

interactions and domain-domain interactions, and succeeded in extracting

protein complex candidates with an accuracy that is twice the average accuracy

of the existing methods, MCL, MCODE, or clustering coefficient. Although the

configuring parameters for each algorithm resulted in slightly improved

precisions, our method always showed better precision for most values of the

parameters

h. This paper proposes a

more appropriate protein complex prediction method, CFA that is based on

connectivity number on sub graphs. We evaluate results of CFA using several

protein networks on reference protein complexes in two benchmark data sets

(MIPS and Aloy), containing 1142 and 61 known complexes respectively. We

compare CFA to some existing protein complex prediction methods (CMC, MCL, PCP

and RNSC) in terms of recall and precision. The CFA predicts more complexes correctly at a

competitive level of precision.

i. In this algorithm,

the fast algorithm is for filtering docked conformations with valuable surface

as complementarity, and defining them based on the performance on their clustering

properties. The available energy filters selects composite protein structure with

lowest desolvation and electrostatic energies. Clustering helps in smoothing

out the local minima and to select the ones with the broadest energy wells—a

property associated with the free energy at the binding site. The strength of

the method was tested on sets of 2000 docked conformations generated for 48

pairs of interacting proteins

j. The

molecular complex detection algorithm (MCODE) works in three stages: vertex

weighting, complex prediction and an optional post?processing step. The weight nodes is based

on the core clustering coefficient ,the use of this coefficient instead of the

standard clustering coefficient, as the size increases the weights of heavily

interconnected graph regions while giving small weights to the less connected

vertices, which are abundant in the scale?free protein interaction networks. After computing the weights the

algorithm traverses through the weighted graph in a greedy fashion to

disconnect densely connected regions. The post?processing step refine and adds proteins

based on connectivity rules. MCODE

method used widely in mapping of large?scale interaction networks. It is

available as a plug?in for

the Cytoscape network visualization software.

k. Predicting

the protein complex interaction in the network raised an algorithm called RRW,

which repeatedly enlarges a current cluster of proteins according to the

stationary vector of a random walk with restarts with the cluster whose

proteins are equally weighted. In the cluster expansion, all the proteins

within the cluster have equal influences on determination of newly added protein

to the cluster. Here we extend the RRW algorithm by introduction of a random

walk with restarts with a cluster of proteins, each of which is weighted by the

sum of the strengths of the proof for the direct physical interactions

involving the protein. The resulting algorithm is called NWE (Node-Weighted

Expansion of clusters of proteins). Those interactive data in the networks are

obtained from the WI-PHI database.

l. Molecular Complex

Detection (MCODE) MCODE uses an agglomerative method that

works in the following stages: protein (vertex) weighting, complex extraction

and an optional post-processing of complexes.

l1.Markov clustering

(MCL) is a fast, highly scalable graph clustering method. Cluster protein

sequences MCL has proved effective for clustering large PPI networks due to its

scalability. MCL works by simulating random walks (called a flow)to extract

dense regions from the network. To simulate the flow, MCL iteratively

manipulates the adjacency matrix of the network using two operators, expansion

and inflation, that control the spread and thickness of the flow, respectively.

L2.Clustering based on merging

Maximal Cliques (CMC) – CMC works by repeated merging of

maximal cliques extracted from the PPI network. CMC includes reliability scores

for PPIs and improves on earlier clique-merging methods, with C Finder Local

Clique Merging Algorithm (LCMA) that work only on unscored networks. CMC begins

by incorporating all maximal cliques in the Protein interaction network using

the fast search-space pruning-based Cliques algorithm. Cliques

are ranked in non-increasing order of their weighted densities. CMC then

iteratively combines highly overlapping cliques depending on the extent of

their inter-connection.

L3.Clustering with

Overlapping Neighborhood Expansion (Cluster ONE) – Cluster ONE works similar to

MCODE, by seeding and greedy neighborhood expansion. Cluster ONE first

identifies seed proteins and greedily expands them into groups V based on a

cohesiveness of measure. At each step, new proteins are included into V until f

(V) does not increase. V is then denoted as a locally cohesive group. Extremely

overlapping clusters are merged to produce candidate complexes. Since this step

allows for overlapping complexes, Cluster ONE enhances the performance of MCODE

and MCL.

m. The method involves

Finding out the Articulation points and bi

connected components, by using Backtracking and branch-and-bound algorithms.

n. In this method we predict PPI

network and understanding its function involves many methods and method is as

follows:

n1. Ensemble clustering includes

multiple algorithm such as (MCL,

CMC, Cluster ONE and HACO) using major voting based scoring. By incorporating

complementary information with the analysis of PPI overcome noise in the data

which will help in prediction of PPI complex structure.

N2. Methods based on network clustering merged

with biological insights.

CORE, CACHET and COACH MCL-caw look for clusters that adhere to the core-attachment organization, noted originally in yeast

complexes. Large-scale pull-down of yeast complexes using TAP-MS in revealed

that proteins within complexes

are organized as two distinct sets.

N3.Methods

incorporating functional information

Proteins are generally formed by same or similar function through

functional annotations for the protein available with the topology of PPI

networks will help in improving complex prediction and use RNSC, Protein Complex

Prediction (PCP) and Dense neighborhood Extraction using Connectivity make use of functional

annotations from Gene Ontology to predict complexes.

1 Functional similarity weight

D( u ,v)=?Nu ? N v?? Nu ? N v?+? Nu ? Nv ?,

The CD-distance between two

proteins u and v is given by

b. Functional similarity weighted

averaging

The ?2 statistics of function j for protein i

is computed by

Si(j)=(ni

(j)?ei(j))2ei(j)

2 a. Majority-We consider all neighboring proteins and sum up the

number of times each annotation occurs for each protein as described in

Schwikowskiet.

b. Neighborhood- For each

protein, the score of a particular function is given by the value of the

?2-test

c. The

functional flow algorithm simplifies the principle of ‘guilt by association’ to

cluster of proteins that may or may not interact with each other physically.

3 a. protein-protein

interaction data using a graphical method called a functional linkage graph in

which an edge (link) between two nodes (proteins) represents that they might share the same function.

b. PROPAGATION OF PROBABILITIES

Since the label probability of a protein depends on its neighbors which

depend in turn on their neighbors, we would like a rigorous method of

increasing our estimate of the label probability of a protein if our estimate

of its unlabeled neighbors’ label probability increases. The Markov random

field inference problem that corresponds to this is the computation of the

marginal label probabilities of the unlabeled (hidden) nodes given some labelled

nodes.

4 All the

known proteins can be classified into one of the two categories according to

their function. Therefore, an interaction between two known proteins can be

classified into one of the three groups: (1, 1), (1, 0) and (0, 0)

Z(?) is called the partition function in the general theory of MRF.

5. a. global optimization principle: a score or energy is associated to any

given assignment (configuration) of functions for the whole set of unclassified

proteins. The score is lower in configurations that maximize the presence of

the same functional annotation in interacting proteins.

b. self-consistent test-concerns the presence of errors in the topology

of the protein network. It is known that protein- protein interactions data

obtained from two hybrid experiments contain an amount of false positives and

negatives and these could in principle alter sensibly the quality of

predictions by providing a spurious connectivity to the network

c. tolerance error checking

6 Indirect functional association

We are interested in finding out how often we would observe that a

protein

shares function with its level-2 neighbours instead of its level-1

neighbour

Functional similarity weight. Using the common interacting partners

between two proteins as an estimate of their functional similarity.

Czekanowski-Dice distance (CD-Distance) as a metric for functional

linkage. The CD-distance between two proteins u and v is given by integrating reliability of experimental

sources

As shown in Nabievaet al.(2005), different experimental sources of

deriving protein–protein interaction may have different reliability.

7 The Gibbs Sampler

M(i)

and M(i)1are the

numbers of interaction partners of protein Pi labelled with 0 and 1 When all the functions of the interaction partners

of a protein are given, it can be used to derive the probability that the protein

has the function, which is the basis of the Gibbs Sampler.

Assume that the parameters

?= (Æ;

Ø; ?)

are given. For a given protein Pi, conditional on the functional labelling of

all the other proteins, we can use the conditional probability Pr(Xij Xi; ?)

in Equation to generate samples

to update the functional labelling of protein Pi. Repeating this procedure many times will generate

samples for the functional labelling of all the unannotated proteins.

b. Bayesian

analysis

For a function of interest, first we estimate the probability,º, that a

protein has the function (without the information on interaction network) by

the fraction of all the proteins having that function Secondly, we estimate the

parameters

?= (Æ;

Ø; ?

8 Estimating the reliability of a putative protein interaction

data set. The reliability of a set of putative protein interactions is defined

as the fraction of real protein interactions over all the putative protein

interactions. To find the precision of the estimation, we use the

following formula to calculate,

9

Prediction Using a-priori

Probabilities Our predictions are based on the idea of “guilt by association”, method

is used for the prediction which is done by a-prior probabilities, when a

protein interacts the hypothetical protein X will have known function which may

share same function with the probability controlled by high throughput data

with the relationship between X and its partner.The function F of reliability

score is given by,

14 The protein of interest is

assigned the function with the highest ?2 value among functions of

all n-neighbouring proteins. For each member of the function category,

the ?2 value is calculated using the following formula:

Overview of the prediction

method. The black circle represents a query protein for which function is

predicted and White circles represent proteins.

(A) Assignment of function to a query protein.

This is done based on the functions of neighboring proteins on the map.

(B) Physical interaction data deposited in the

public databases.

(C) Development of the protein interaction map

by integrating all physical interaction data.

15 Functional similarity weight

a.D(u,v)=?Nu?Nv??Nu?Nv?+?Nu?Nv?,

The CD-distance between two

proteins u and v is given by

b. Functional similarity weighted

averaging

The ?2 statistics of function j for protein i

is computed by

Si(j)=(ni(j)?ei(j))2ei(j)

16Majority

We consider all neighbouring proteins and sum up the number of times

each annotation occurs for each protein as describe in Schwikowsk et at.

(2000). In the case of weighted interaction graphs, we simply extend the method

by taking a weighted sum instead. For each protein, the score of a particular

function is the corresponding sum.

Gen Multicut

In multiway K -cut, the task is to partition a graph in such a way that

each Of k terminal nodes belongs to a different subset of the partition and so

that the (weighted) number of edges that are ‘cut’ in the process is minimized.

In the more general version of the multiway k-cut problem considered here, the

goal is to assign a unique function to all the unannotated nodes so as to

minimize the sum of the costs of the edges joining nodes .

17 NEIGHBORHOOD FUNCTION

The MRF framework needs the specification of neighbourhood functions

that describe the dependence of the label probability of a node on the labels

of its neighbours. Different types of neighbourhood conditional probability

functions can be used to model different types of local dependency structure.

DATA

SOURCES

The following datasets were used in our analysis: Protein–Protein

Interactions: The GRID dataset contained 20 985 distinct interactions

catalogued between13 607 distinct pairs of proteins. 4708 proteins participated

in interactions. 1442 unlabeled, connected proteins were potential labelling

target. 4588 of these are in a single connected component, the second largest

component has 4 proteins.

18 MRF model based function prediction method is detailed elsewhere. Here

we just describe it briefly. Let a function of interest be category 1 and the

rest be category 0. All the known proteins can be classified into one of the

two categories according to their function. Thus, an interaction between two

known proteins can be classified into one of the three groups: (1, 1), (1, 0)

and (0,0).Given a protein physical interaction network (Net1) and a genetic

interaction network (Net2), the belief can be represented by a Gibbs

distribution (Li, 1995) for this function by considering the classification of

all the proteins.