We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible

El-Kebir, M., Marschall, T., Wohlers, I., Patterson, M., Heringa, J., Schönhuth, A., et al. (2013). Mapping proteins in the presence of paralogs using units of coevolution. BMC BIOINFORMATICS, 14(S15), S18 [10.1186/1471-2105-14-S15-S18].

Mapping proteins in the presence of paralogs using units of coevolution

Patterson, Murray;
2013

Abstract

We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible
Articolo in rivista - Articolo scientifico
Amino Acid Sequence; Molecular Sequence Data; Proteins; Sequence Alignment; Sequence Analysis, Protein; Software; Structural Biology; Biochemistry; Molecular Biology; Computer Science Applications1707 Computer Vision and Pattern Recognition; Applied Mathematics
English
2013
14
S15
S18
open
El-Kebir, M., Marschall, T., Wohlers, I., Patterson, M., Heringa, J., Schönhuth, A., et al. (2013). Mapping proteins in the presence of paralogs using units of coevolution. BMC BIOINFORMATICS, 14(S15), S18 [10.1186/1471-2105-14-S15-S18].
File in questo prodotto:
File Dimensione Formato  
mapping_2013.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 923 kB
Formato Adobe PDF
923 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/217371
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
Social impact