The similarity/diversity measures play a fundamental role in library searching, virtual screening, and quantitative structure-activity relationship/quantitative structure-property relationship modeling as well as in genomics and proteomics. In this paper, a new similarity/diversity measure is proposed as a new approach for the analysis of sequential data, where useful information can be also obtained by the ordering relationships between the sequence elements. This methodology can be applied for evaluating molecular similarity/diversity, using sets of sequential descriptors, and for evaluating the similarity between spectra, sensor arrays, and other sequential data such as DNA and protein sequences. The new proposed distance (weighted standardized Hasse distance) is evaluated between pairs of Hasse matrices derived from the classical partial-ordering rules. It can be naturally standardized, thus allowing the interpretation of these distances as absolute values (e.g., percentage) and deriving simple similarity and correlation indices. A simple example is taken to highlight the behavior of the new similarity/diversity measure on DNA sequences taken from the first exons of the β-globins for eight different species. Sensitivity analysis has been also performed, showing the high capability of this measure to take into account small modifications of the DNA sequences. Finally, a comparison with results obtained from the literature is given, together with a comparison with matrix invariants derived from the Hasse matrix. © 2006 American Chemical Society.

Todeschini, R., Consonni, V., Mauri, A., Ballabio, D. (2006). Characterization of DNA primary sequences by a new similarity/diversity measure based on the partial ordering. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 46(5), 1905-1911 [10.1021/ci060099e].

Characterization of DNA primary sequences by a new similarity/diversity measure based on the partial ordering

TODESCHINI, ROBERTO;CONSONNI, VIVIANA;MAURI, ANDREA;BALLABIO, DAVIDE
2006

Abstract

The similarity/diversity measures play a fundamental role in library searching, virtual screening, and quantitative structure-activity relationship/quantitative structure-property relationship modeling as well as in genomics and proteomics. In this paper, a new similarity/diversity measure is proposed as a new approach for the analysis of sequential data, where useful information can be also obtained by the ordering relationships between the sequence elements. This methodology can be applied for evaluating molecular similarity/diversity, using sets of sequential descriptors, and for evaluating the similarity between spectra, sensor arrays, and other sequential data such as DNA and protein sequences. The new proposed distance (weighted standardized Hasse distance) is evaluated between pairs of Hasse matrices derived from the classical partial-ordering rules. It can be naturally standardized, thus allowing the interpretation of these distances as absolute values (e.g., percentage) and deriving simple similarity and correlation indices. A simple example is taken to highlight the behavior of the new similarity/diversity measure on DNA sequences taken from the first exons of the β-globins for eight different species. Sensitivity analysis has been also performed, showing the high capability of this measure to take into account small modifications of the DNA sequences. Finally, a comparison with results obtained from the literature is given, together with a comparison with matrix invariants derived from the Hasse matrix. © 2006 American Chemical Society.
Articolo in rivista - Articolo scientifico
DNA sequences; similarity; diversity; partial ranking; Hasse diagrams
English
2006
46
5
1905
1911
none
Todeschini, R., Consonni, V., Mauri, A., Ballabio, D. (2006). Characterization of DNA primary sequences by a new similarity/diversity measure based on the partial ordering. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 46(5), 1905-1911 [10.1021/ci060099e].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/2725
Citazioni
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 9
Social impact