Bicocca Open Archive

Background: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. Results: To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. Conclusions: Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap.

Tangherloni, A., Spolaor, S., Rundo, L., Nobile, M., Cazzaniga, P., Mauri, G., et al. (2019). GenHap: A novel computational method based on genetic algorithms for haplotype assembly. BMC BIOINFORMATICS, 20(Suppl 4) [10.1186/s12859-019-2691-y].

GenHap: A novel computational method based on genetic algorithms for haplotype assembly

Tangherloni, A;Spolaor, S;Rundo, L;Nobile, MS;Cazzaniga, P;Mauri, G;Liò, P;Merelli, I;Besozzi, D

2019

Abstract

Background: In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. Results: To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. Conclusions: Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Combinatorial optimization; Future-generation sequencing; Genetic algorithms; Haplotype assembly; Weighted minimum error correction problem;
			
	Parole chiave
	
				Combinatorial optimization; Future-generation sequencing; Genetic algorithms; Haplotype assembly; Weighted minimum error correction problem
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2019
			
	Rivista
	
				BMC BIOINFORMATICS
			
	Numero del volume
	
				20
			
	Fascicolo
	
				Suppl 4
			
	Article number
	
				172
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1186/s12859-019-2691-y
			
	Fulltext
	
				partially_open
			
	Citazione
	
				Tangherloni, A., Spolaor, S., Rundo, L., Nobile, M., Cazzaniga, P., Mauri, G., et al. (2019). GenHap: A novel computational method based on genetic algorithms for haplotype assembly. BMC BIOINFORMATICS, 20(Suppl 4) [10.1186/s12859-019-2691-y].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
R187-BMC GenHap.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri
GenHap A novel computational method based on genetic algorithms for haplotype assembly.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.85 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/227775

Citazioni

28

19

Social impact