Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. An approach to build a model able to include just the molecular interactions. In this context, Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. As PPI networks involve from hundreds to thousands of components, the use of these models for clinical and biological applications is strictly dependent on the Bioinformatics field. PPI networks are based on physical/functional interactions, which come from experimental and computational techniques. However, PPIs often lacks of reliability, don't cover all the interactions of an organism, and because of their biological nature are condition-specific. Thus, the PPIs detected in a specific biological context may not be valid to build a model of a system under different conditions. To overcome these issues an alternative to build protein interaction network models consists in using large-scale quantitative proteomic data, i.e. the levels of expression of protein sets detected in condition-specific organic samples. While, co-expression analysis, based on the correlation within gene expression, has been widely used to build gene co-expression networks, this technique has rarely been used on proteomics data. However, it represents a complementary procedure that gives the opportunity to evaluate a biological context at system level, including organisms that lack information on PPIs. PPI network structure is routinely analyzed by algorithms and tools to identify key proteins and protein groups functionally linked, called modules. Interpreting a PPI, however, it is a particularly challenging task because of network complexity and limitations due to their biological nature. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO), to take into account the biological nature of the problem. However, up to now, these methods provide just a topological interpretation of the networks and further analysis is needed to infer biological knowledge about the phenomenon represented. Based on these premises, this dissertation introduces the reader to protein interaction networks, in particular PPI based and co-expression based, in order to face aspects of reconstruction and analysis. As regards the reconstruction of these networks, the new idea to evaluate large-scale proteomic data by means of co-expression networks have been investigated, focusing on some state-of-art studies. As results, an analysis pipeline, specific for the amyloidosis disease, based on protein co-expression networks has been developed. As regard the analysis of these networks, a special attention has been devoted to the topological and module analysis. Firstly, the most used metrics and mathematical models have been revised. Secondly, the problem of module identifications in PPI networks has been faced, considering characteristics and limitations of state-of-art techniques. As result of this study, a novel algorithm has been developed; it is called MTGO, that stands for Module detection via Topological information and GO knowledge. MTGO let emerge the biomolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. A software version of MTGO, freely available at https://gitlab.com/d1vella/MTGO, has been produced and some examples of application have been explored. Moreover, for the method validation, MTGO has been compared with state-of-art algorithms. Finally, the stability of the algorithm has been investigated, considering both the random components, on which it relies on, and the presence of noisy PPI interactions on network models.

Grazie al miglioramento delle tecnologie omiche sempre più attenzione è rivolta alla valutazione dei sistemi biologici attraverso approcci basati sulla teoria dei grafi. In questo contesto, le cosiddette reti Protein-Protein Interaction (PPI) si sono affermate come strumento utile per la comprensione delle funzioni cellulari, per lo studio di malattie e terapie. Dato che le reti PPI coinvolgono da centinaia a migliaia di componenti, l’utilizzo di questi modelli per applicazioni cliniche e biologiche, è strettamente legato al campo della Bioinformatica. Le reti PPI sono basate su interazioni fisico/funzionali tra proteine, identificate tramite tecniche sperimentali e computazionali. Tuttavia, le PPI spesso mancano di affidabilità e sono in grado di descrivere solo una parte delle interazioni di un organismo. Inoltre, a causa della loro natura biologica, sono strettamente dipendenti dalle condizioni sperimentali. Pertanto, le PPI rilevate in uno specifico contesto biologico potrebbero non essere valide per la costruzione di un modello di un sistema in condizioni diverse. Per superare questi problemi, un'alternativa alla costruzione alle reti PPI consiste nell'utilizzare dati quantitativi proteomici; cioè i livelli di espressione di gruppi di proteine, rilevati in campioni organici specifici per condizione. L'analisi di co-espressione, ovvero la correlazione dei livelli di espressione molecolari, è un metodo che si è affermato nel campo della genomica, ma è raramente utilizzato in quello della proteomica. Tuttavia, rappresenta una procedura complementare che offre l'opportunità di valutare un contesto biologico a livello di sistema, e la possibilità di analizzare quegli organismi per cui le PPI sono poco conosciute. Ai fini della ricerca, le reti PPI sono analizzate con algoritmi per poter identificare le proteine più importanti e gruppi di proteine funzionalmente collegate, detti moduli. Gli algoritmi proposti in genere sono basati solo sulla topologia/struttura della rete; ultimamente, per tenere conto della natura biologica di questi modelli, anche la conoscenza sulla Gene Ontology (GO) è stata integrata nel processo di identificazione dei moduli. Tuttavia, ad oggi, questi metodi forniscono solo un'interpretazione topologica delle reti (cioè riguardante la struttura del modello); sono quindi necessarie ulteriori analisi per dedurre conoscenze biologiche riguardo al fenomeno studiato. Sulla base di queste premesse, questa tesi introduce il lettore alle reti di interazione di proteine, in particolare le reti basate sulle PPI e sulla co-espressione, per poter poi affrontare aspetti relativi alla ricostruzione e all'analisi di questi modelli. Per quanto riguarda la ricostruzione, attraverso la revisione di recenti lavori scientifici, è stato investigato il nuovo approccio di valutazione del dato proteomico mediante reti di co-espressione. Successivamente, è stata sviluppata una pipeline di analisi, specifica per la malattia amiloidosi, basata appunto su questi modelli. Per quanto riguarda l'analisi, innanzitutto, sono stati revisionati i modelli matematici e le metriche più utilizzati. Successivamente, è stato affrontato il problema dell’identificazioni dei moduli; analizzando caratteristiche e limiti delle tecniche comunemente applicate alle reti PPI. Tale studio ha portato poi, allo sviluppo di un nuovo algoritmo, MTGO (ovvero Module detection via Topological information and GO knowledge). MTGO ha l’obiettivo di individuare i meccanismi biomolecolari alla base delle reti PPI, sfruttando sia le conoscenze biologiche che le proprietà topologiche del modello. È stata prodotta una versione software di MTGO, disponibile gratuitamente su https://gitlab.com/d1vella/MTGO. Ai fini della validazione, MTGO è stato confrontato con gli algoritmi più diffusi ed è stata studiata la stabilità.

(2018). Protein Interaction Networks: from construction methods to the development of a novel algorithm for functional module identification. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2018).

Protein Interaction Networks: from construction methods to the development of a novel algorithm for functional module identification

VELLA, DANILA
2018

Abstract

Thanks to the improvement of the current -omics technologies and the increasing understanding of the molecular relationships, even more studies are evaluating the biological systems through approaches based on graph theory. An approach to build a model able to include just the molecular interactions. In this context, Protein-protein interaction (PPI) networks are viable tools to understand cell functions, disease machinery, and drug design/repositioning. As PPI networks involve from hundreds to thousands of components, the use of these models for clinical and biological applications is strictly dependent on the Bioinformatics field. PPI networks are based on physical/functional interactions, which come from experimental and computational techniques. However, PPIs often lacks of reliability, don't cover all the interactions of an organism, and because of their biological nature are condition-specific. Thus, the PPIs detected in a specific biological context may not be valid to build a model of a system under different conditions. To overcome these issues an alternative to build protein interaction network models consists in using large-scale quantitative proteomic data, i.e. the levels of expression of protein sets detected in condition-specific organic samples. While, co-expression analysis, based on the correlation within gene expression, has been widely used to build gene co-expression networks, this technique has rarely been used on proteomics data. However, it represents a complementary procedure that gives the opportunity to evaluate a biological context at system level, including organisms that lack information on PPIs. PPI network structure is routinely analyzed by algorithms and tools to identify key proteins and protein groups functionally linked, called modules. Interpreting a PPI, however, it is a particularly challenging task because of network complexity and limitations due to their biological nature. Several algorithms have been proposed for an automatic PPI interpretation, at first by solely considering the network topology, and later by integrating Gene Ontology (GO), to take into account the biological nature of the problem. However, up to now, these methods provide just a topological interpretation of the networks and further analysis is needed to infer biological knowledge about the phenomenon represented. Based on these premises, this dissertation introduces the reader to protein interaction networks, in particular PPI based and co-expression based, in order to face aspects of reconstruction and analysis. As regards the reconstruction of these networks, the new idea to evaluate large-scale proteomic data by means of co-expression networks have been investigated, focusing on some state-of-art studies. As results, an analysis pipeline, specific for the amyloidosis disease, based on protein co-expression networks has been developed. As regard the analysis of these networks, a special attention has been devoted to the topological and module analysis. Firstly, the most used metrics and mathematical models have been revised. Secondly, the problem of module identifications in PPI networks has been faced, considering characteristics and limitations of state-of-art techniques. As result of this study, a novel algorithm has been developed; it is called MTGO, that stands for Module detection via Topological information and GO knowledge. MTGO let emerge the biomolecular machinery underpinning PPI networks by leveraging on both biological knowledge and topological properties. A software version of MTGO, freely available at https://gitlab.com/d1vella/MTGO, has been produced and some examples of application have been explored. Moreover, for the method validation, MTGO has been compared with state-of-art algorithms. Finally, the stability of the algorithm has been investigated, considering both the random components, on which it relies on, and the presence of noisy PPI interactions on network models.
MAURI, GIANCARLO
LEPORATI, ALBERTO OTTAVIO
BELLAZZI, RICCARDO
Network,; Protein,; Module; identification; algorithm
Network,; Protein,; Module; identification; algorithm
INF/01 - INFORMATICA
English
8-mar-2018
INFORMATICA - 87R
30
2016/2017
open
(2018). Protein Interaction Networks: from construction methods to the development of a novel algorithm for functional module identification. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2018).
File in questo prodotto:
File Dimensione Formato  
phd_unimib_798697.pdf

accesso aperto

Descrizione: tesi di dottorato
Tipologia di allegato: Doctoral thesis
Dimensione 19.04 MB
Formato Adobe PDF
19.04 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/199009
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact