<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
Journal article(2019)
-
Marco Ranzani, Constantine Alifrangis, Nicola A. Thompson, Alistair G. Rust, Amin Allahyar, Vivek Iyer, Stacey Price, Peter Ellis, Gemma Turner, More Authors...
Journal article(2019)
-
Amin Allahyar, Joske Ubels, Jeroen de Ridder
Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.
...
Robustly predicting outcome for cancer patients from gene expression is an important challenge on the road to better personalized treatment. Network-based outcome predictors (NOPs), which considers the cellular wiring diagram in the classification, hold much promise to improve performance, stability and interpretability of identified marker genes. Problematically, reports on the efficacy of NOPs are conflicting and for instance suggest that utilizing random networks performs on par to networks that describe biologically relevant interactions. In this paper we turn the prediction problem around: instead of using a given biological network in the NOP, we aim to identify the network of genes that truly improves outcome prediction. To this end, we propose SyNet, a gene network constructed ab initio from synergistic gene pairs derived from survival-labelled gene expression data. To obtain SyNet, we evaluate synergy for all 69 million pairwise combinations of genes resulting in a network that is specific to the dataset and phenotype under study and can be used to in a NOP model. We evaluated SyNet and 11 other networks on a compendium dataset of >4000 survival-labelled breast cancer samples. For this purpose, we used cross-study validation which more closely emulates real world application of these outcome predictors. We find that SyNet is the only network that truly improves performance, stability and interpretability in several existing NOPs. We show that SyNet overlaps significantly with existing gene networks, and can be confidently predicted (~85% AUC) from graph-topological descriptions of these networks, in particular the breast tissue-specific network. Due to its data-driven nature, SyNet is not biased to well-studied genes and thus facilitates post-hoc interpretation. We find that SyNet is highly enriched for known breast cancer genes and genes related to e.g. histological grade and tamoxifen resistance, suggestive of a role in determining breast cancer outcome.
Journal article(2018)
-
Amin Allahyar, Carlo Vermeulen, Britta A.M. Bouwman, Peter H.L. Krijger, Marjon J.A.M. Verstegen, Geert Geeven, Mark Pieterse, Roy Straver, Kees Jalink, More authors...
Chromatin folding contributes to the regulation of genomic processes such as gene activity. Existing conformation capture methods characterize genome topology through analysis of pairwise chromatin contacts in populations of cells but cannot discern whether individual interactions occur simultaneously or competitively. Here we present multi-contact 4C (MC-4C), which applies Nanopore sequencing to study multi-way DNA conformations of individual alleles. MC-4C distinguishes cooperative from random and competing interactions and identifies previously missed structures in subpopulations of cells. We show that individual elements of the β-globin superenhancer can aggregate into an enhancer hub that can simultaneously accommodate two genes. Neighboring chromatin domain loops can form rosette-like structures through collision of their CTCF-bound anchors, as seen most prominently in cells lacking the cohesin-unloading factor WAPL. Here, massive collision of CTCF-anchored chromatin loops is believed to reflect ‘cohesin traffic jams’. Single-allele topology studies thus help us understand the mechanisms underlying genome folding and functioning.
...
Chromatin folding contributes to the regulation of genomic processes such as gene activity. Existing conformation capture methods characterize genome topology through analysis of pairwise chromatin contacts in populations of cells but cannot discern whether individual interactions occur simultaneously or competitively. Here we present multi-contact 4C (MC-4C), which applies Nanopore sequencing to study multi-way DNA conformations of individual alleles. MC-4C distinguishes cooperative from random and competing interactions and identifies previously missed structures in subpopulations of cells. We show that individual elements of the β-globin superenhancer can aggregate into an enhancer hub that can simultaneously accommodate two genes. Neighboring chromatin domain loops can form rosette-like structures through collision of their CTCF-bound anchors, as seen most prominently in cells lacking the cohesin-unloading factor WAPL. Here, massive collision of CTCF-anchored chromatin loops is believed to reflect ‘cohesin traffic jams’. Single-allele topology studies thus help us understand the mechanisms underlying genome folding and functioning.
In the last two decades, our understanding of the molecular mechanisms within the cell has witnessed a great leap forward. For the most part this is due to the fast innovation of the genomic measurements technologies and wide spread usage of computational methods which enables knowledge extraction from the massive datasets produced by these measurements. A notable example of a field that has substantially benefitted from this progress is cancer patient outcome prediction, in which the aim is to predict patient prognosis from common clinical variables such as tumor size, age or histological parameters. With the application of machine learning methods to gene expression profiles of the tumor a major improvement of the prediction accuracy could be realized. These models are later succeeded by Network based Outcome Predictors (NOP) that consider the cellular wiring diagram of cell in the model to identify stable and relevant markers that can accurately estimate outcome of patients. Problematically, after a decade of research in this area, NOPs did not find extensive application compared to the classical models due to contradicting reports regarding their performance, stability and relevance of markers in the literature. In this thesis, we introduce a new NOP - called FERAL - that alleviates several fundamental issues in state-of-the-art NOPs which prevented these models to reach the optimal prediction performance, stability and marker relevance. We furthermore demonstrate that generic biological networks do not contain sufficiently informative interactions to truly aid NOP. We therefore infer a phenotype-specific network called SyNet which connects pairs of genes that together achieve patient outcome prediction performance beyond what is attainable by individually genes. We show that a NOP that use identical gene expression datasets, yields superior performance merely by considering groups of genes suggested by SyNet. We, moreover, show that model performance is severely reduced if nodes in SyNet are shuffled, which confirms that also the links in SyNet are relevant to outcome prediction. An important limitation of current biological networks is that they are restricted to pairwise interactions. We show that higher order interactions between functional elements in the cell are relevant in outcome prediction. We later introduce a novel genomics method called Multi-Contact 4C (MC-4C) to measure and investigate multi-way interactions between functional elements. In contrast to existing methods, MC-4C exploits long-read 3rd generation sequencing technologies and detects higher order interactions that occur in a region of interest at the level of a single allele. We further devise a well-founded statistical model that is required for significance estimation of observed interactions. UsingMC-4C, we experimentally confirm a 26 years old hypothesis regarding the looping and co-localization of enhancers in the O -globin region in the mouse genome. Additionally, we provide the first experimental explanation for the “vermicelli” phenomenon that was observed through microscopic inspection of cells depleted of WAPL (the element responsible for unwinding of loops in mammalian cells). Therefore, targeted multi-way conformation analysis methods like MC-4C promise to uncover how the multitude of regulatory sequences and genes coordinate their activity in the spatial context of the genome.
...
In the last two decades, our understanding of the molecular mechanisms within the cell has witnessed a great leap forward. For the most part this is due to the fast innovation of the genomic measurements technologies and wide spread usage of computational methods which enables knowledge extraction from the massive datasets produced by these measurements. A notable example of a field that has substantially benefitted from this progress is cancer patient outcome prediction, in which the aim is to predict patient prognosis from common clinical variables such as tumor size, age or histological parameters. With the application of machine learning methods to gene expression profiles of the tumor a major improvement of the prediction accuracy could be realized. These models are later succeeded by Network based Outcome Predictors (NOP) that consider the cellular wiring diagram of cell in the model to identify stable and relevant markers that can accurately estimate outcome of patients. Problematically, after a decade of research in this area, NOPs did not find extensive application compared to the classical models due to contradicting reports regarding their performance, stability and relevance of markers in the literature. In this thesis, we introduce a new NOP - called FERAL - that alleviates several fundamental issues in state-of-the-art NOPs which prevented these models to reach the optimal prediction performance, stability and marker relevance. We furthermore demonstrate that generic biological networks do not contain sufficiently informative interactions to truly aid NOP. We therefore infer a phenotype-specific network called SyNet which connects pairs of genes that together achieve patient outcome prediction performance beyond what is attainable by individually genes. We show that a NOP that use identical gene expression datasets, yields superior performance merely by considering groups of genes suggested by SyNet. We, moreover, show that model performance is severely reduced if nodes in SyNet are shuffled, which confirms that also the links in SyNet are relevant to outcome prediction. An important limitation of current biological networks is that they are restricted to pairwise interactions. We show that higher order interactions between functional elements in the cell are relevant in outcome prediction. We later introduce a novel genomics method called Multi-Contact 4C (MC-4C) to measure and investigate multi-way interactions between functional elements. In contrast to existing methods, MC-4C exploits long-read 3rd generation sequencing technologies and detects higher order interactions that occur in a region of interest at the level of a single allele. We further devise a well-founded statistical model that is required for significance estimation of observed interactions. UsingMC-4C, we experimentally confirm a 26 years old hypothesis regarding the looping and co-localization of enhancers in the O -globin region in the mouse genome. Additionally, we provide the first experimental explanation for the “vermicelli” phenomenon that was observed through microscopic inspection of cells depleted of WAPL (the element responsible for unwinding of loops in mammalian cells). Therefore, targeted multi-way conformation analysis methods like MC-4C promise to uncover how the multitude of regulatory sequences and genes coordinate their activity in the spatial context of the genome.
Journal article(2016)
-
Kathryn L. Gilroy, Anne Terry, Anna Kilbey, James C. Neil, Asif Naseer, Jeroen de Ridder, Amin Allahyar, Weiwei Wang, Eric Carpenter, Andrew Mason, Gane K.S. Wong, Ewan R. Cameron
Retroviruses have been foundational in cancer research since early studies identified protooncogenes as targets for insertional mutagenesis. Integration of murine gamma-retroviruses into the host genome favours promoters and enhancers and entails interaction of viral integrase with host BET/bromodomain factors. We report that this integration pattern is conserved in feline leukaemia virus (FeLV), a gamma-retrovirus that infects many human cell types. Analysis of FeLV insertion sites in the MCF-7 mammary carcinoma cell line revealed strong bias towards active chromatin marks with no evidence of significant post-integration growth selection. The most prominent FeLV integration targets had little overlap with the most abundantly expressed transcripts, but were strongly enriched for annotated cancer genes. A meta-analysis based on several gamma-retrovirus integration profiling (GRIP) studies in human cells (CD34+, K562, HepG2) revealed a similar cancer gene bias but also remarkable cell-type specificity, with prominent exceptions including a universal integration hotspot at the long non-coding RNA MALAT1. Comparison of GRIP targets with databases of super-enhancers from the same cell lines showed that these have only limited overlap and that GRIP provides unique insights into the upstream drivers of cell growth. These observations elucidate the oncogenic potency of the gamma-retroviruses and support the wider application of GRIP to identify the genes and growth regulatory circuits that drive distinct cancer types.
...
Retroviruses have been foundational in cancer research since early studies identified protooncogenes as targets for insertional mutagenesis. Integration of murine gamma-retroviruses into the host genome favours promoters and enhancers and entails interaction of viral integrase with host BET/bromodomain factors. We report that this integration pattern is conserved in feline leukaemia virus (FeLV), a gamma-retrovirus that infects many human cell types. Analysis of FeLV insertion sites in the MCF-7 mammary carcinoma cell line revealed strong bias towards active chromatin marks with no evidence of significant post-integration growth selection. The most prominent FeLV integration targets had little overlap with the most abundantly expressed transcripts, but were strongly enriched for annotated cancer genes. A meta-analysis based on several gamma-retrovirus integration profiling (GRIP) studies in human cells (CD34+, K562, HepG2) revealed a similar cancer gene bias but also remarkable cell-type specificity, with prominent exceptions including a universal integration hotspot at the long non-coding RNA MALAT1. Comparison of GRIP targets with databases of super-enhancers from the same cell lines showed that these have only limited overlap and that GRIP provides unique insights into the upstream drivers of cell growth. These observations elucidate the oncogenic potency of the gamma-retroviruses and support the wider application of GRIP to identify the genes and growth regulatory circuits that drive distinct cancer types.