Bicocca Open Archive

Background A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. Results We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. Conclusions PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.

Pirola, Y., Rizzi, R., Picardi, E., Pesole, G., DELLA VEDOVA, G., Bonizzoni, P. (2012). PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text. BMC BIOINFORMATICS, 13(Suppl 5), S2 [10.1186/1471-2105-13-S5-S2].

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

PIROLA, YURI;RIZZI, RAFFAELLA;Picardi, E;Pesole, G;DELLA VEDOVA, GIANLUCA;BONIZZONI, PAOLA

2012

Abstract

Background A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. Results We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. Conclusions PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				algorithms; alternative splicing; gene structure; EST
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2012
			
	Rivista
	
				BMC BIOINFORMATICS
			
	Numero del volume
	
				13
			
	Fascicolo
	
				Suppl 5
			
	Pagina iniziale
	
				S2
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1186/1471-2105-13-S5-S2
			
	Fulltext
	
				open
			
	Citazione
	
				Pirola, Y., Rizzi, R., Picardi, E., Pesole, G., DELLA VEDOVA, G., Bonizzoni, P. (2012). PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text. BMC BIOINFORMATICS, 13(Suppl 5), S2 [10.1186/1471-2105-13-S5-S2].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1471-2105-13-S5-S2.pdf accesso aperto Descrizione: Articolo principale Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Dimensione 676.88 kB Formato Adobe PDF Visualizza/Apri	676.88 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/30735

Citazioni

8

6

Social impact