This paper reports an analysis and comparison of the use of 51 different similarity coefficients for computing the similarities between binary fingerprints for both simulated and real chemical datasets. Five pairs and a triplet of coefficients were found to yield identical similarity values, leading to the elimination of seven of the coefficients. The remaining 44 coefficients were then compared in two ways: by their theoretical characteristics using simple descriptive statistics, correlation analysis, multi-dimensional scaling, Hasse diagrams, and the recently described atemporal target diffusion model; and by their effectiveness for similarity-based virtual screening using MDDR, WOMBAT and MUV data. The comparisons demonstrate the general utility of the well-known Tanimoto method, but also suggest other coefficients that may be worthy of further attention

Todeschini, R., Consonni, V., Hua, X., Holliday, J., Buscema, M., Willett, P. (2012). Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real datasets. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 52(11), 2884-2901 [10.1021/ci300261r].

Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real datasets

TODESCHINI, ROBERTO;CONSONNI, VIVIANA;
2012

Abstract

This paper reports an analysis and comparison of the use of 51 different similarity coefficients for computing the similarities between binary fingerprints for both simulated and real chemical datasets. Five pairs and a triplet of coefficients were found to yield identical similarity values, leading to the elimination of seven of the coefficients. The remaining 44 coefficients were then compared in two ways: by their theoretical characteristics using simple descriptive statistics, correlation analysis, multi-dimensional scaling, Hasse diagrams, and the recently described atemporal target diffusion model; and by their effectiveness for similarity-based virtual screening using MDDR, WOMBAT and MUV data. The comparisons demonstrate the general utility of the well-known Tanimoto method, but also suggest other coefficients that may be worthy of further attention
Articolo in rivista - Articolo scientifico
binary similarity coefficients; multivariate comparison; ATDM method; virtual screening
English
2012
52
11
2884
2901
reserved
Todeschini, R., Consonni, V., Hua, X., Holliday, J., Buscema, M., Willett, P. (2012). Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real datasets. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 52(11), 2884-2901 [10.1021/ci300261r].
File in questo prodotto:
File Dimensione Formato  
Binary_sim_coeff_JCIM.pdf

Solo gestori archivio

Dimensione 4.32 MB
Formato Adobe PDF
4.32 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/37943
Citazioni
  • Scopus 148
  • ???jsp.display-item.citation.isi??? 130
Social impact