High - throughput bioinformatic approaches to study tumorigenesis in mammalian cells

Balestrieri, C

The analysis of transcriptional data has become increasingly populating in the last decade due to the advent of new high-throughput technologies in genome research, since the reported invention by the Pat Brown laboratory in 1995 (Schena et al., 1995) and by Affymetrix in 1996 (Lockhart et al., 1996). DNA microarray is a multiuse technology, in fact different technologies are employed to produce the microarray chips and different technical approaches are used for analyzing microarray data, ranging from statistical models of the decision process to machine-learning methods for identifying class predictors. The underlying technology is extremely complex. In fact, DNA microarrays generate large amounts of numerical data that should be analyzed effectively. Therefore the chosen of an appropriate analysis for DNA microarray experiments is the most important key to perform the assay and utilize the data correctly. Genome wide expression profiling is a powerful tool for the investigation of novel gene ensembles in cellular mechanisms of health and disease. In fact, the DNA microarray expression analysis can be used to study complex multigenic diseases such as cancer. The great challenge in understanding the genetics of such disorders is the identification of susceptibility genes, which are genes that increase a person's risk of developing the disease. Decades of molecular genetics researches have shown that cancer is a heterogeneous cellular disorder caused by the deregulation of many interacting cellular pathways that converge to generate tumor formation and growth. Since the draft sequence of the human genome was published in 2001 (Lander et al., 2001) the Cancer Genome Anatomy Project index of tumor genes has classified more than 40000 genes directly or indirectly involved in one or more cancers (Strausberg, 2001; Strausberg et al., 2000). The rapid accumulation of high-resolution cancer genetic data, now promises to enable far more comprehensive and unbiased inference of uncharacterized cancer genes linked to complex tumor traits such as metastasis and angiogenesis (Vogelstein and Kinzler, 2004). During the last three years, I focused the attention on the analysis and the interpretation of GeneChip data, with the aim of setting up workflows useful to characterize different cellular physiological and pathological (i.e., cancer) conditions, to dissect the effects of nutrient perturbations on cell culture models, to interpret time-dependent gene expression fluctuations as well as to identify, by orthologous comparisons, phylogenetic conservation of promotorial regulative sequences and cancer cells signatures. Taking into consideration that the development of efficient methods that facilitate the biological interpretation of these data is crucial, in this thesis the work has been focused on some new ideas and analytical methods in order to get an efficient identification of cancer regulatory mechanisms. In this regard my thesis work proposes the use of several approaches for analysis and interpretation of gene expression data, based on the integration of different types of related biological information and software tools for efficient data analysis. The most important contribution of this thesis to the scientific community is the proposal of integrating different “omic” approaches for the study of systemic disease as cancer. It is worth pointing that the proceeding in this way requires gathering information from several fields, such as molecular biology, biochemistry, mathematic, informatics, statistic ect., which altogether provide fundamental knowledge to establish the contextualized study’s framework.

(2011). High - throughput bioinformatic approaches to study tumorigenesis in mammalian cells. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2011).