The aim of the thesis is the development of algorithms to study the evolution of genomic information starting from data produced by Next Generation Sequencing (NGS) technologies. We address the problem of reconstructing evolutionary histories following two research directions which both explore algorithms for the generation (or reconstruction) of ancestral recombination graphs (or phylogenetic trees) modeling the evolution in presence of evolutionary events, such as recombination and Single Nucleotide Polymorphisms (SNPs). The first research direction regards the development of efficient algorithms for simulating complex scenarios of multiple population evolution with admixture. The aim of simulations is to obtain the resulting extant population samples and their common relevant evolutionary history captured by an ARG. We propose a backward simulation algorithm, named SimRA, for modeling complex evolutionary scenarios, which improves time and space requirements of the classical algorithm of single populations. Through extensive simulation experiments, we show that SimRA produces ARGs in compact form without compromising any accuracy. Moreover, we present the first combinatorial approach, based on persistency in topology, which detects admixture in populations. We show, based on efficient and controlled simulations computed by SimRA, that the topological framework has the potential for detecting admixture in related populations. The second research direction regards the development of efficient algorithms to reconstruct phylogenesis of contemporary species described by genomic binary characters. Established maximum parsimony models are Dollo and Camin-Sokal, both leading to NP-hard reconstruction problems. On the other hand, the perfect phylogeny, which has very efficient polynomial time algorithmic solutions, is often too restrictive for explaining the evolution of real biological data where homoplasy is present. We address the problem of reconstructing a variant of the perfect phylogeny model, the persistent phylogeny, that is more widely applicable, with the aim of retaining the computational efficiency. For this purpose, we introduce the Constrained Persistent Perfect Phylogeny problem (CPPP) which generalizes the Persistent Perfect Phylogeny (PPP) problem, by adding constraints for some observed characters. We provide a polynomial time algorithm for a particular class of instances and a parameterized algorithm for solving the general problem. We conclude the thesis with results concerning the scaffold filling computational problem which derives from the necessity of filling incomplete genomic sequences in order to maximize their similarity with a known reference genome. We consider two scaffold filling problems (One-sided and Two-sided) that are NP-hard under the maximum number of common adjacencies similarity. We design two Fixed Parameterized Tractable (FPT)-algorithms respectively for the One-side and Two-side scaffold filling problem, with only one parameter representing the number of common adjacencies between the two filled genomes.

(2016). Sampling Ancestral Recombination Graphs and Reconstruction of Phylogenetic Trees for Explaining Evolution. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2016).

Sampling Ancestral Recombination Graphs and Reconstruction of Phylogenetic Trees for Explaining Evolution

CARRIERI, ANNA PAOLA
2016

Abstract

The aim of the thesis is the development of algorithms to study the evolution of genomic information starting from data produced by Next Generation Sequencing (NGS) technologies. We address the problem of reconstructing evolutionary histories following two research directions which both explore algorithms for the generation (or reconstruction) of ancestral recombination graphs (or phylogenetic trees) modeling the evolution in presence of evolutionary events, such as recombination and Single Nucleotide Polymorphisms (SNPs). The first research direction regards the development of efficient algorithms for simulating complex scenarios of multiple population evolution with admixture. The aim of simulations is to obtain the resulting extant population samples and their common relevant evolutionary history captured by an ARG. We propose a backward simulation algorithm, named SimRA, for modeling complex evolutionary scenarios, which improves time and space requirements of the classical algorithm of single populations. Through extensive simulation experiments, we show that SimRA produces ARGs in compact form without compromising any accuracy. Moreover, we present the first combinatorial approach, based on persistency in topology, which detects admixture in populations. We show, based on efficient and controlled simulations computed by SimRA, that the topological framework has the potential for detecting admixture in related populations. The second research direction regards the development of efficient algorithms to reconstruct phylogenesis of contemporary species described by genomic binary characters. Established maximum parsimony models are Dollo and Camin-Sokal, both leading to NP-hard reconstruction problems. On the other hand, the perfect phylogeny, which has very efficient polynomial time algorithmic solutions, is often too restrictive for explaining the evolution of real biological data where homoplasy is present. We address the problem of reconstructing a variant of the perfect phylogeny model, the persistent phylogeny, that is more widely applicable, with the aim of retaining the computational efficiency. For this purpose, we introduce the Constrained Persistent Perfect Phylogeny problem (CPPP) which generalizes the Persistent Perfect Phylogeny (PPP) problem, by adding constraints for some observed characters. We provide a polynomial time algorithm for a particular class of instances and a parameterized algorithm for solving the general problem. We conclude the thesis with results concerning the scaffold filling computational problem which derives from the necessity of filling incomplete genomic sequences in order to maximize their similarity with a known reference genome. We consider two scaffold filling problems (One-sided and Two-sided) that are NP-hard under the maximum number of common adjacencies similarity. We design two Fixed Parameterized Tractable (FPT)-algorithms respectively for the One-side and Two-side scaffold filling problem, with only one parameter representing the number of common adjacencies between the two filled genomes.
POMELLO CHINAGLIA POMELLO, LUCIA
ancestral recombination graphs, phylogenesis, algorithms
INF/01 - INFORMATICA
English
22-feb-2016
INFORMATICA - 22R
28
2014/2015
open
(2016). Sampling Ancestral Recombination Graphs and Reconstruction of Phylogenetic Trees for Explaining Evolution. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2016).
File in questo prodotto:
File Dimensione Formato  
phd_unimib_733668.pdf

accesso aperto

Descrizione: Tesi dottorato
Tipologia di allegato: Doctoral thesis
Dimensione 2.18 MB
Formato Adobe PDF
2.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/102072
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact