Bicocca Open Archive

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.

Mauri, G., Pavesi, G. (2001). Parallel algorithms for the analysis of biological sequences. In Parallel Computing Technologies 6th International Conference, PaCT 2001, Novosibirsk, Russia, September 3-7, 2001 Proceedings (pp.456-468). Springer [10.1007/3-540-44743-1_48].

Parallel algorithms for the analysis of biological sequences

Mauri, G;Pavesi, G

2001

Abstract

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				bioinformatics; sequence analysis
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				6th International Conference, PaCT 2001 - September 3-7, 2001
			
	Anno del convegno
	
				2001
			
	Curatori della monografia
	
				Malyshkin, V
			
	Titolo degli atti
	
				Parallel Computing Technologies
6th International Conference, PaCT 2001, Novosibirsk, Russia, September 3-7, 2001 Proceedings
			
	ISBN del volume degli atti
	
				9783540425229
			
	Collana o serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Data di pubblicazione
	
				2001
			
	Numero del volume
	
				2127 LNCS
			
	Pagina iniziale
	
				456
			
	Pagina finale
	
				468
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1007/3-540-44743-1_48
			
	Fulltext
	
				none
			
	Citazione
	
				Mauri, G., Pavesi, G. (2001). Parallel algorithms for the analysis of biological sequences. In Parallel Computing Technologies
6th International Conference, PaCT 2001, Novosibirsk, Russia, September 3-7, 2001 Proceedings (pp.456-468). Springer [10.1007/3-540-44743-1_48].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/17535

Citazioni

0

0

Social impact