Bicocca Open Archive

A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing both accuracy and efficiency in time and space, when large clusters of over than 20,000 ESTs and genes longer than 1Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron and it is able to process in a few seconds some critical genes that are not manageable by other gene structure prediction tools. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when compared with ENCODE data.

Bonizzoni, P., DELLA VEDOVA, G., Pirola, Y., Rizzi, R. (2011). PIntron: A fast method for gene structure prediction via maximal pairings of a pattern and a text. In Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on (pp.33-39). IEEE Computer Society [10.1109/ICCABS.2011.5729935].

PIntron: A fast method for gene structure prediction via maximal pairings of a pattern and a text

BONIZZONI, PAOLA;DELLA VEDOVA, GIANLUCA;PIROLA, YURI;RIZZI, RAFFAELLA

2011

Abstract

A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing both accuracy and efficiency in time and space, when large clusters of over than 20,000 ESTs and genes longer than 1Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron and it is able to process in a few seconds some critical genes that are not manageable by other gene structure prediction tools. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when compared with ENCODE data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Alternative splicing; Gene structure; Maximal pairing; Transcript alignment;
			
	Parole chiave
	
				algorithms; alternative splicing; gene structure; EST
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				1st IEEE International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2011
			
	Anno del convegno
	
				2011
			
	Curatori della monografia
	
				Mandoiu, I; Miyano, S; Przytycka, T; Rajasekaaran, S
			
	Titolo degli atti
	
				Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
			
	ISBN del volume degli atti
	
				9781612848525
			
	Data di pubblicazione
	
				2011
			
	Pagina iniziale
	
				33
			
	Pagina finale
	
				39
			
	Article number
	
				5729935
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/ICCABS.2011.5729935
			
	Fulltext
	
				reserved
			
	Citazione
	
				Bonizzoni, P., DELLA VEDOVA, G., Pirola, Y., Rizzi, R. (2011). PIntron: A fast method for gene structure prediction via maximal pairings of a pattern and a text. In Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on (pp.33-39). IEEE Computer Society [10.1109/ICCABS.2011.5729935].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
conf-paper-11-iccabs.pdf Solo gestori archivio Descrizione: Articolo principale Dimensione 140.42 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	140.42 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/19835

Citazioni

0

ND

Social impact