For the past decade or so, Computational Intelligence has been an extremely hot topic among researchers working in the fields of biomedicine and bioinformatics. Notwithstanding the many successful applications there are still many problems in biomedicine and bioinformatics that are in enormous need of advanced and efficient computational methodologies to deal with the tremendous amounts of data so prevalent in those kinds of research pursuits. In an attempt to fill this gap, in the last decade many tools of Systems Biology have been developed to elaborate the large quantity of data generated by high-throughput experimental techniques with the increasingly sophisticated range of mathematical modelling techniques. Many models have been proposed to describe the network, one of the most extensively used is Boolean Network, that notwithstanding its numerous successes, in some cases could suffer from being too coarse. Another widely studied candidate is the system of differential equations, which is a very powerful and flexible model to describe complex relations among components. But it is not necessarily easy to determine the suitable form of equations which represent the network. Thus, the form of the differential equations had been fixed during the learning phase in previous studies. As a result, their goal was to simply optimize parameters, i.e., coefficients in the fixed equations. In the analysis of time series of gene expression data presented in this thesis, a mathematical model has been identified and a system for the reconstruction of a Gene Regulatory Network Driven from Data has been implemented. Based on Genetic Programming, its target is to extracts knowledge and properties from data and so to generate the network that underlies the behaviour of genes. For this reason the system is called Data Driven Gene Regulatory Network Generator. Planning to individualize the mutual interactions between genes, a Genetic Programming application for the extraction of the best activation function of the genes has also been developed. In order to test such a system, it has been applied to a serial temporal dataset of microarray gene expression data of breast cancer, while a study aimed at predicting the survival of a set of cancer patients has also been performed. This study has led to the definition of a Medical Decision Support System. The activation functions of genes performed by this system have been successively used to reconstruct the gene regulatory network that underlies the development, response and regulation of the biological system. With the intent to test it, a reverse engineering of a synthetic gene regulatory network has been made and a dynamic symulation has been performed allowing for the related time series reconstruction. The gene regulatory network used for the reverse engineering has been the recently published IRMA network, a yeast synthetic network for the assessment of reverse engineering networks and modelling approaches. Finally, in order to apply this system to a realistic gene regulatory network composed by thousands of genes, a new cluster kernel method has been identified and a framework driven by it has been developed. It is based on Gene Ontology to facilitate the detection of similar patterns of interacting genes, with the aim of reducing the dimension of the related serial temporal data.
(2011). Computational Intelligence Approaches: from Time Series to Data Driven Gene Regulatory Network. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2011).
Computational Intelligence Approaches: from Time Series to Data Driven Gene Regulatory Network
FARINACCIO, ANTONELLA
2011
Abstract
For the past decade or so, Computational Intelligence has been an extremely hot topic among researchers working in the fields of biomedicine and bioinformatics. Notwithstanding the many successful applications there are still many problems in biomedicine and bioinformatics that are in enormous need of advanced and efficient computational methodologies to deal with the tremendous amounts of data so prevalent in those kinds of research pursuits. In an attempt to fill this gap, in the last decade many tools of Systems Biology have been developed to elaborate the large quantity of data generated by high-throughput experimental techniques with the increasingly sophisticated range of mathematical modelling techniques. Many models have been proposed to describe the network, one of the most extensively used is Boolean Network, that notwithstanding its numerous successes, in some cases could suffer from being too coarse. Another widely studied candidate is the system of differential equations, which is a very powerful and flexible model to describe complex relations among components. But it is not necessarily easy to determine the suitable form of equations which represent the network. Thus, the form of the differential equations had been fixed during the learning phase in previous studies. As a result, their goal was to simply optimize parameters, i.e., coefficients in the fixed equations. In the analysis of time series of gene expression data presented in this thesis, a mathematical model has been identified and a system for the reconstruction of a Gene Regulatory Network Driven from Data has been implemented. Based on Genetic Programming, its target is to extracts knowledge and properties from data and so to generate the network that underlies the behaviour of genes. For this reason the system is called Data Driven Gene Regulatory Network Generator. Planning to individualize the mutual interactions between genes, a Genetic Programming application for the extraction of the best activation function of the genes has also been developed. In order to test such a system, it has been applied to a serial temporal dataset of microarray gene expression data of breast cancer, while a study aimed at predicting the survival of a set of cancer patients has also been performed. This study has led to the definition of a Medical Decision Support System. The activation functions of genes performed by this system have been successively used to reconstruct the gene regulatory network that underlies the development, response and regulation of the biological system. With the intent to test it, a reverse engineering of a synthetic gene regulatory network has been made and a dynamic symulation has been performed allowing for the related time series reconstruction. The gene regulatory network used for the reverse engineering has been the recently published IRMA network, a yeast synthetic network for the assessment of reverse engineering networks and modelling approaches. Finally, in order to apply this system to a realistic gene regulatory network composed by thousands of genes, a new cluster kernel method has been identified and a framework driven by it has been developed. It is based on Gene Ontology to facilitate the detection of similar patterns of interacting genes, with the aim of reducing the dimension of the related serial temporal data.File | Dimensione | Formato | |
---|---|---|---|
Phd_unimib_716388 .pdf
accesso aperto
Tipologia di allegato:
Doctoral thesis
Dimensione
3.7 MB
Formato
Adobe PDF
|
3.7 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.