In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.

Mauri, G., Pavesi, G. (2001). Parallel algorithms for the analysis of biological sequences. In Parallel Computing Technologies 6th International Conference, PaCT 2001, Novosibirsk, Russia, September 3-7, 2001 Proceedings (pp.456-468). Springer [10.1007/3-540-44743-1_48].

Parallel algorithms for the analysis of biological sequences

Mauri, G;
2001

Abstract

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.
slide + paper
bioinformatics; sequence analysis
English
6th International Conference, PaCT 2001 - September 3-7, 2001
2001
Malyshkin, V
Parallel Computing Technologies 6th International Conference, PaCT 2001, Novosibirsk, Russia, September 3-7, 2001 Proceedings
9783540425229
2001
2127 LNCS
456
468
none
Mauri, G., Pavesi, G. (2001). Parallel algorithms for the analysis of biological sequences. In Parallel Computing Technologies 6th International Conference, PaCT 2001, Novosibirsk, Russia, September 3-7, 2001 Proceedings (pp.456-468). Springer [10.1007/3-540-44743-1_48].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/17535
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
Social impact