Bicocca Open Archive

Software has become as essential to molecular biologists as the Bunsen burner was a few decades ago. Biological data come mainly in the form of DNA or protein sequences, i.e., strings over alphabets of four or 20 symbols, respectively. The main challenge now is to develop efficient and powerful algorithms to extract as much meaning as possible from the huge amount of data generated in the last few years. In this paper we present a parallel pattern discovery algorithm that given a set of functionally related sequences finds the substrings that occur in all (or most of) the sequences of the set. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substrings.

Mauri, G., Pavesi, G. (2002). A parallel algorithm for pattern discovery in biological sequences. FUTURE GENERATION COMPUTER SYSTEMS, 18(6), 849-854 [10.1016/S0167-739X(02)00057-2].

A parallel algorithm for pattern discovery in biological sequences

MAURI, GIANCARLO;Pavesi, G.

2002

Abstract

Software has become as essential to molecular biologists as the Bunsen burner was a few decades ago. Biological data come mainly in the form of DNA or protein sequences, i.e., strings over alphabets of four or 20 symbols, respectively. The main challenge now is to develop efficient and powerful algorithms to extract as much meaning as possible from the huge amount of data generated in the last few years. In this paper we present a parallel pattern discovery algorithm that given a set of functionally related sequences finds the substrings that occur in all (or most of) the sequences of the set. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substrings.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Approximate pattern matching; Pattern discovery; Suffix trees;
			
	Parole chiave
	
				approximate pattern matching; pattern discovery; suffix trees
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				mag-2002
			
	Rivista
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	Numero del volume
	
				18
			
	Fascicolo
	
				6
			
	Pagina iniziale
	
				849
			
	Pagina finale
	
				854
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/S0167-739X(02)00057-2
			
	Fulltext
	
				none
			
	Citazione
	
				Mauri, G., Pavesi, G. (2002). A parallel algorithm for pattern discovery in biological sequences. FUTURE GENERATION COMPUTER SYSTEMS, 18(6), 849-854 [10.1016/S0167-739X(02)00057-2].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/2413

Citazioni

2

2

Social impact