For many important classes of biomolecules such as RNA and proteins, a direct relationship exists between structure and function. On the contrary the relationships between genomic sequences and molecular structures are still poorly understood. The determination of the three dimensional structure of biomolecules on a genome-scale is hence one of the major challenges in modern biology. Indeed, today genomic data are easily achievable, thanks to next generation sequencing technology, while structural data are still obtained through complex experimental protocols. As a result, the disproportion between the available amount of genomic and structural data limits the progress in several fields such as drug discovery and synthetic biology. The use of computational methods and mathematical optimization in structural biology is fundamental to reduce the amount of data required from experiments speeding up experimental protocols and to define in silico protocols for the prediction of three dimensional structures. This thesis introduces novel heuristic approaches to tackle two important problems in structural biology: the protein structure prediction (PSP) and the molecular distance geometry (MDG) problem. %The MDG problem consists in reconstructing a three dimensional structure using a set of distance restraints obtained through a nuclear magnetic resonance (NMR) experiment. Both these problems are known to have a complex combinatorial structure and are classified as NP-hard. Therefore the proposed approaches are based on stochastic optimization heuristics (SOH), which provide a powerful framework to tackle complex combinatorial problems that do not allow for exact approaches. The PSP problem have been treated in the simplified representation provided by the hydrophobic polar (HP) model; a new perturbation strategy has been introduced to mimic off-lattice approaches and to provide a complementary benchmark to the existing move sets. Two heuristics, based on the principle of \emph{local landscape mapping}, have been tested on several benchmark instances both in combination with the new perturbation strategy and with standard move sets. The results show that one of the proposed heuristics outperforms state of the art methods on the majority of the considered instances. In the case of the MDG problem, results show that the proposed methodology is able to achieve a performance comparable to the state of the art and to overcome most limitations of the existing approaches.

La determinazione della struttura tridimensionale delle biomolecole su scala genomica è uno dei più importanti obbiettivi della biologia moderna con potenziali ricadute in differenti contesti applicativi che spaziano dalla farmacologia, alla biologia sintetica. Il ruolo dei metodi computazionali ed in particolare dei metodi di ottimizzazione in quest'ambito è fondamentale per l'interpretazione dei dati sperimentali. Inoltre nell'ultimo decennio la predizione computazionale della struttura di importanti classi di biomolecole come RNA e proteine è diventata una prospettiva concreta. Questa tesi presenta due nuovi metodi di ottimizzazione stocastica progettati rispettivamente per il problema della predizione della struttura delle proteine nel modello idrofobico polare e per il problema della ricostruzione della struttura da dati NMR. Il primo problema consiste nel trovare un assegnamento in un reticolo di una stringa binaria tale da minimizzare una data funzione di costo e senza violare un insieme di vincoli. Il secondo, consiste nel identificare una disposizione di atomi nello spazio tridimensionale che rispetti un insieme di vincoli di distanza. Entrambi questi problemi sono rilevanti dal punto di vista computazionale in quanto è stata dimostrata la NP-completezza del problema di decisione associato. Pertanto essi rappresentano un ottimo banco di prova per le euristiche di ottimizzazione stocastica. Nel caso della predizione della struttura nel modello idrofobico polare, i risultati ottenuti su una serie di istanze di benchmark mostrano che la strategia proposta può essere adattata a differenti modelli di rappresentazione migliorando in alcuni casi la performance rispetto allo stato dell'arte. Per quanto riguarda la ricostruzione di strutture da dati NMR, i risultati suggeriscono che il metodo proposto sia in grado di raggiungere un'accuratezza comparabile a quella dello stato dell'arte offrendo altresì numerosi vantaggi in termini di applicabilità rispetto agli approcci esistenti.

(2015). Novel Computational Approaches for Protein Structure Prediction and Optimization. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2015).

Novel Computational Approaches for Protein Structure Prediction and Optimization

CITROLO, ANDREA GAETANO
2015

Abstract

For many important classes of biomolecules such as RNA and proteins, a direct relationship exists between structure and function. On the contrary the relationships between genomic sequences and molecular structures are still poorly understood. The determination of the three dimensional structure of biomolecules on a genome-scale is hence one of the major challenges in modern biology. Indeed, today genomic data are easily achievable, thanks to next generation sequencing technology, while structural data are still obtained through complex experimental protocols. As a result, the disproportion between the available amount of genomic and structural data limits the progress in several fields such as drug discovery and synthetic biology. The use of computational methods and mathematical optimization in structural biology is fundamental to reduce the amount of data required from experiments speeding up experimental protocols and to define in silico protocols for the prediction of three dimensional structures. This thesis introduces novel heuristic approaches to tackle two important problems in structural biology: the protein structure prediction (PSP) and the molecular distance geometry (MDG) problem. %The MDG problem consists in reconstructing a three dimensional structure using a set of distance restraints obtained through a nuclear magnetic resonance (NMR) experiment. Both these problems are known to have a complex combinatorial structure and are classified as NP-hard. Therefore the proposed approaches are based on stochastic optimization heuristics (SOH), which provide a powerful framework to tackle complex combinatorial problems that do not allow for exact approaches. The PSP problem have been treated in the simplified representation provided by the hydrophobic polar (HP) model; a new perturbation strategy has been introduced to mimic off-lattice approaches and to provide a complementary benchmark to the existing move sets. Two heuristics, based on the principle of \emph{local landscape mapping}, have been tested on several benchmark instances both in combination with the new perturbation strategy and with standard move sets. The results show that one of the proposed heuristics outperforms state of the art methods on the majority of the considered instances. In the case of the MDG problem, results show that the proposed methodology is able to achieve a performance comparable to the state of the art and to overcome most limitations of the existing approaches.
DE PAOLI, FLAVIO MARIA
Protein Folding; Combinatorial Optimization; Heuristics
Ripiegamento Proteico; Ottimizzazione Combinatoria; Euristiche
INF/01 - INFORMATICA
English
18-set-2015
Scuola di dottorato di Scienze
INFORMATICA - 22R
27
2013/2014
open
(2015). Novel Computational Approaches for Protein Structure Prediction and Optimization. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2015).
File in questo prodotto:
File Dimensione Formato  
phd_unimib_077738.pdf

accesso aperto

Descrizione: Tesi dottorato
Tipologia di allegato: Doctoral thesis
Dimensione 9.43 MB
Formato Adobe PDF
9.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/88167
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact