For the past decade or so, Computational Intelligence has been an extremely hot topic among researchers working in the fields of biomedicine and bioinformatics. Notwithstanding the many successful applications there are still many problems in biomedicine and bioinformatics that are in enormous need of advanced and efficient computational methodologies to deal with the tremendous amounts of data so prevalent in those kinds of research pursuits. In an attempt to fill this gap, in the last decade many tools of Systems Biology have been developed to elaborate the large quantity of data generated by high-throughput experimental techniques with the increasingly sophisticated range of mathematical modelling techniques. Many models have been proposed to describe the network, one of the most extensively used is Boolean Network, that notwithstanding its numerous successes, in some cases could suffer from being too coarse. Another widely studied candidate is the system of differential equations, which is a very powerful and flexible model to describe complex relations among components. But it is not necessarily easy to determine the suitable form of equations which represent the network. Thus, the form of the differential equations had been fixed during the learning phase in previous studies. As a result, their goal was to simply optimize parameters, i.e., coefficients in the fixed equations. In the analysis of time series of gene expression data presented in this thesis, a mathematical model has been identified and a system for the reconstruction of a Gene Regulatory Network Driven from Data has been implemented. Based on Genetic Programming, its target is to extracts knowledge and properties from data and so to generate the network that underlies the behaviour of genes. For this reason the system is called Data Driven Gene Regulatory Network Generator. Planning to individualize the mutual interactions between genes, a Genetic Programming application for the extraction of the best activation function of the genes has also been developed. In order to test such a system, it has been applied to a serial temporal dataset of microarray gene expression data of breast cancer, while a study aimed at predicting the survival of a set of cancer patients has also been performed. This study has led to the definition of a Medical Decision Support System. The activation functions of genes performed by this system have been successively used to reconstruct the gene regulatory network that underlies the development, response and regulation of the biological system. With the intent to test it, a reverse engineering of a synthetic gene regulatory network has been made and a dynamic symulation has been performed allowing for the related time series reconstruction. The gene regulatory network used for the reverse engineering has been the recently published IRMA network, a yeast synthetic network for the assessment of reverse engineering networks and modelling approaches. Finally, in order to apply this system to a realistic gene regulatory network composed by thousands of genes, a new cluster kernel method has been identified and a framework driven by it has been developed. It is based on Gene Ontology to facilitate the detection of similar patterns of interacting genes, with the aim of reducing the dimension of the related serial temporal data.

(2011). Computational Intelligence Approaches: from Time Series to Data Driven Gene Regulatory Network. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2011).

Computational Intelligence Approaches: from Time Series to Data Driven Gene Regulatory Network

FARINACCIO, ANTONELLA
2011

Abstract

For the past decade or so, Computational Intelligence has been an extremely hot topic among researchers working in the fields of biomedicine and bioinformatics. Notwithstanding the many successful applications there are still many problems in biomedicine and bioinformatics that are in enormous need of advanced and efficient computational methodologies to deal with the tremendous amounts of data so prevalent in those kinds of research pursuits. In an attempt to fill this gap, in the last decade many tools of Systems Biology have been developed to elaborate the large quantity of data generated by high-throughput experimental techniques with the increasingly sophisticated range of mathematical modelling techniques. Many models have been proposed to describe the network, one of the most extensively used is Boolean Network, that notwithstanding its numerous successes, in some cases could suffer from being too coarse. Another widely studied candidate is the system of differential equations, which is a very powerful and flexible model to describe complex relations among components. But it is not necessarily easy to determine the suitable form of equations which represent the network. Thus, the form of the differential equations had been fixed during the learning phase in previous studies. As a result, their goal was to simply optimize parameters, i.e., coefficients in the fixed equations. In the analysis of time series of gene expression data presented in this thesis, a mathematical model has been identified and a system for the reconstruction of a Gene Regulatory Network Driven from Data has been implemented. Based on Genetic Programming, its target is to extracts knowledge and properties from data and so to generate the network that underlies the behaviour of genes. For this reason the system is called Data Driven Gene Regulatory Network Generator. Planning to individualize the mutual interactions between genes, a Genetic Programming application for the extraction of the best activation function of the genes has also been developed. In order to test such a system, it has been applied to a serial temporal dataset of microarray gene expression data of breast cancer, while a study aimed at predicting the survival of a set of cancer patients has also been performed. This study has led to the definition of a Medical Decision Support System. The activation functions of genes performed by this system have been successively used to reconstruct the gene regulatory network that underlies the development, response and regulation of the biological system. With the intent to test it, a reverse engineering of a synthetic gene regulatory network has been made and a dynamic symulation has been performed allowing for the related time series reconstruction. The gene regulatory network used for the reverse engineering has been the recently published IRMA network, a yeast synthetic network for the assessment of reverse engineering networks and modelling approaches. Finally, in order to apply this system to a realistic gene regulatory network composed by thousands of genes, a new cluster kernel method has been identified and a framework driven by it has been developed. It is based on Gene Ontology to facilitate the detection of similar patterns of interacting genes, with the aim of reducing the dimension of the related serial temporal data.
MAURI, GIANCARLO
VANNESCHI, LEONARDO
Computational Intelligence, System Biology, Microarray, Time Series, Gene Expression Data, Gene Regulatory Network, Reverse Engineering, Genetic Programming, Machine Learning
INF/01 - INFORMATICA
English
8-feb-2011
Scuola di dottorato di Scienze
INFORMATICA - 22R
23
2009/2010
open
(2011). Computational Intelligence Approaches: from Time Series to Data Driven Gene Regulatory Network. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2011).
File in questo prodotto:
File Dimensione Formato  
Phd_unimib_716388 .pdf

accesso aperto

Tipologia di allegato: Doctoral thesis
Dimensione 3.7 MB
Formato Adobe PDF
3.7 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/19257
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact