Computational epigenomics in gene regulation and cancer research

More Info
expand_more

Abstract

DNA is packaged together with proteins, such as histones, in the nucleus of a cell to form a fiber called chromatin. The nature of this packaging, the "chromatin structure", is essential for proper cell functioning. This is illustrated by the fact that perturbating chromatin can be associated with many diseases. Hence, artificial perturbation of chromatin may give important new insights into its function. In this dissertation, we have perturbed chromatin by 1) inducing mutations by integrating retroviruses and transposons into DNA, and 2) evicting histones from chromatin and inducing DNA breaks, by the application of anti-cancer drugs. As means of perturbing chromatin, DNA integrating elements such as retroviruses and transposons are used in gene regulation and cancer research, among others. In cancer research, DNA integrating elements are used for detecting cancer genes from tumor screens. We presented a novel algorithm that fully automates this detection, thus removing any potential for bias induced by manual analysis. In gene regulation, DNA integrating elements can be used for studying the chromatin position effect by the location-dependent activation of transgenes present within randomly integrated transposons. We presented a high-throughput method for studying the chromatin position effect using DNA integrating elements, and studied genome-wide transgene expression values generated using this method, especially in relation to enhancers and domains associated with the nuclear lamina. For both applications of DNA integrating elements, it is important to realize that integrations are randomly, but not uniformly randomly, distributed across the genome. For this purpose, we generated large datasets of integrations that were under minimal selective pressure, for two transposons and one retrovirus. We compared the integration profiles with a wide range of (epi)genomic features to generate bias maps across multiple genomic scales. This revealed a hierarchical organization in target site selection, and showed that a substantial fraction of cancer genes retrieved from tumor screens may be false positives. The application of anti-cancer drugs to directly perturb chromatin structure allowed us to take a very low-level approach in studying chromatin. We showed that different drugs target different types of chromatin in evicting histones from chromatin and/or inducing DNA breaks, which can have implications for their chemotherapeutic efficacy. Central themes throughout this dissertation were computational epigenomics and data integration. Due to the complexity of the biology and the data, many of the computational methods were highly customized. Some are more generally applicable. Examples include a method for the normalization of genome-wide sequencing data with control, and a feature ranking method. However, in general high levels of customization are unavoidable. Therefore, as a conclusion, the careful consideration that must go into decisions regarding this customization was illustrated by demonstrating the substantial impact that these decisions can have on research outcomes.