Recently, there has been a resurgence of interest in rigorous and scalable algorithms for efficient inference of cancer progression using genomic patient data. The motivations are manifold: (i) rapidly growing NGS and single cell data from cancer patients, (ii) long-felt need for novel Data Science and Machine Learning algorithms well-suited for inferring models of cancer progression, and finally, (iii) a desire to understand the temporal and heterogeneous structure of tumor so as to tame its natural progression through most efficacious therapeutic intervention. This thesis presents a multi-disciplinary effort to algorithmically and efficiently model tumor progression involving successive accumulation of genetic alterations, each resulting populations manifesting themselves with a novel cancer phenotype. The framework presented in this work along with efficient algorithms derived from it, represents a novel and versatile approach for inferring cancer progression, whose accuracy and convergence rates surpass other existing techniques. The approach derives its power from many insights from, and contributes to, several fields including algorithms in machine learning, theory of causality, and cancer biology. Furthermore, an optimal, versatile and modular pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes is also proposed. The pipeline combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. Finally, the results are rigorously validated using synthetic data created with realistic generative models, and empirically interpreted in the context of real cancer datasets; in the later case, biologically significant conclusions revealed by the reconstructed progressions are also highlighted. Specifically, the pipeline's ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses is also demonstrate. Lastly, it is also proved that the proposed framework can be applied, mutatis mutandis, in reconstructing the evolutionary history of cancer clones in single patients, as illustrated by an example with multiple biopsy data from clear cell renal carcinomas.

(2016). A Model of Selective Advantage for the Efficient Inference of Cancer Clonal Evolution. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2016).

A Model of Selective Advantage for the Efficient Inference of Cancer Clonal Evolution

RAMAZZOTTI, DANIELE
2016

Abstract

Recently, there has been a resurgence of interest in rigorous and scalable algorithms for efficient inference of cancer progression using genomic patient data. The motivations are manifold: (i) rapidly growing NGS and single cell data from cancer patients, (ii) long-felt need for novel Data Science and Machine Learning algorithms well-suited for inferring models of cancer progression, and finally, (iii) a desire to understand the temporal and heterogeneous structure of tumor so as to tame its natural progression through most efficacious therapeutic intervention. This thesis presents a multi-disciplinary effort to algorithmically and efficiently model tumor progression involving successive accumulation of genetic alterations, each resulting populations manifesting themselves with a novel cancer phenotype. The framework presented in this work along with efficient algorithms derived from it, represents a novel and versatile approach for inferring cancer progression, whose accuracy and convergence rates surpass other existing techniques. The approach derives its power from many insights from, and contributes to, several fields including algorithms in machine learning, theory of causality, and cancer biology. Furthermore, an optimal, versatile and modular pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes is also proposed. The pipeline combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations and progression model inference. Finally, the results are rigorously validated using synthetic data created with realistic generative models, and empirically interpreted in the context of real cancer datasets; in the later case, biologically significant conclusions revealed by the reconstructed progressions are also highlighted. Specifically, the pipeline's ability to reproduce much of the current knowledge on colorectal cancer progression, as well as to suggest novel experimentally verifiable hypotheses is also demonstrate. Lastly, it is also proved that the proposed framework can be applied, mutatis mutandis, in reconstructing the evolutionary history of cancer clones in single patients, as illustrated by an example with multiple biopsy data from clear cell renal carcinomas.
STELLA, FABIO ANTONIO
Bioinformatics; Algorithms; Cancer evolution; Causality; Graphical models; Data analysis; Data mining; Machine learning
INF/01 - INFORMATICA
English
22-feb-2016
INFORMATICA - 22R
28
2014/2015
open
(2016). A Model of Selective Advantage for the Efficient Inference of Cancer Clonal Evolution. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2016).
File in questo prodotto:
File Dimensione Formato  
phd_unimib_725339.pdf

accesso aperto

Descrizione: Tesi dottorato
Tipologia di allegato: Doctoral thesis
Dimensione 37.6 MB
Formato Adobe PDF
37.6 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/100453
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact