Joana Gonçalves
Please Note
27 records found
1
Motivation Controlling the outcomes of CRISPR editing is crucial for the success of gene therapy. Since donor template-based editing is often inefficient, alternative strategies have emerged that leverage mutagenic end-joining repair instead. Existing machine learning models can accurately predict end-joining repair outcomes; however, generalisability beyond the specific cell line used for training remains a challenge, and interpretability is typically limited by suboptimal feature representation and model architecture. Results We propose X-CRISP, a flexible and interpretable neural network for predicting repair outcome frequencies based on a minimal set of outcome and sequence features, including microhomologies (MH). Outperforming prior models on detailed and aggregate outcome predictions, X-CRISP prioritised MH location over MH sequence properties such as GC content for deletion outcomes. Through transfer learning, we adapted X-CRISP pre-trained on wild-type mESC data to target human cell lines K562, HAP1, U2OS, and mESC lines with altered DNA repair function. Adapted X-CRISP models improved over direct training on target data from as few as 50 samples, suggesting that this strategy could be leveraged to build models for new domains using a fraction of the data required to train models from scratch.
The presence of amyloid pathology can have a profound effect on the surrounding cellular neighborhood. While this impact has been mainly investigated for amyloid plaques in the context of Alzheimer's disease (AD), other forms of amyloid deposits can also be found in the brain and in other organs. In the pancreas, amyloid deposits consist of islet amyloid polypeptide (IAPP) and are a hallmark of type 2 diabetes (T2D). Notably, T2D has been associated with an increased risk of developing AD, and as such T2D is a common comorbidity of AD. It has therefore been suggested that these diseases may share pathophysiological processes. To advance our understanding in this respect, we compared the cellular and transcriptomic responses related to the proximity of amyloid pathology across the AD brain and T2D pancreas.
Method
Xenium single-cell spatial transcriptomic profiling was applied to tissue sections from a human post-mortem AD brain (150,060 cells) and a T2D pancreas (256,907 cells). Spatial transcriptomics images were integrated with amyloid histopathology images to determine the proximity of individual cells to amyloid deposits. Together with cell type predictions, this enabled the investigation and cross-organ comparison of amyloid-associated changes in cell type composition and gene expression changes.
Result
With respect to cell type composition, in the brain a higher proportion of microglia could be observed close to amyloid pathology, while in the pancreas this was mirrored by a higher proportion of macrophages as well as a higher proportion of activated stellate cells. Cell type specific differential gene expression analysis based on amyloid proximity revealed many cell types with altered gene expression, including astrocytes, microglia, oligodendrocytes and endothelial cells in the brain and acinar, alpha and activated stellate cells in the pancreas. Comparison across organs revealed 16 shared genes differentially expressed with proximity to amyloid deposits, including CAV1, CXCR4, MS4A6A, SNCG, and SOX2.
Conclusion
Here we spatially investigate the impact of amyloid deposits on the cellular and transcriptomic microenvironment in the brain and pancreas. Our analysis revealed a common set of amyloid proximity related genes, providing insight into potentially shared pathological pathways underlying AD and T2D. ...
The presence of amyloid pathology can have a profound effect on the surrounding cellular neighborhood. While this impact has been mainly investigated for amyloid plaques in the context of Alzheimer's disease (AD), other forms of amyloid deposits can also be found in the brain and in other organs. In the pancreas, amyloid deposits consist of islet amyloid polypeptide (IAPP) and are a hallmark of type 2 diabetes (T2D). Notably, T2D has been associated with an increased risk of developing AD, and as such T2D is a common comorbidity of AD. It has therefore been suggested that these diseases may share pathophysiological processes. To advance our understanding in this respect, we compared the cellular and transcriptomic responses related to the proximity of amyloid pathology across the AD brain and T2D pancreas.
Method
Xenium single-cell spatial transcriptomic profiling was applied to tissue sections from a human post-mortem AD brain (150,060 cells) and a T2D pancreas (256,907 cells). Spatial transcriptomics images were integrated with amyloid histopathology images to determine the proximity of individual cells to amyloid deposits. Together with cell type predictions, this enabled the investigation and cross-organ comparison of amyloid-associated changes in cell type composition and gene expression changes.
Result
With respect to cell type composition, in the brain a higher proportion of microglia could be observed close to amyloid pathology, while in the pancreas this was mirrored by a higher proportion of macrophages as well as a higher proportion of activated stellate cells. Cell type specific differential gene expression analysis based on amyloid proximity revealed many cell types with altered gene expression, including astrocytes, microglia, oligodendrocytes and endothelial cells in the brain and acinar, alpha and activated stellate cells in the pancreas. Comparison across organs revealed 16 shared genes differentially expressed with proximity to amyloid deposits, including CAV1, CXCR4, MS4A6A, SNCG, and SOX2.
Conclusion
Here we spatially investigate the impact of amyloid deposits on the cellular and transcriptomic microenvironment in the brain and pancreas. Our analysis revealed a common set of amyloid proximity related genes, providing insight into potentially shared pathological pathways underlying AD and T2D.
Correction to
Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP) (Nature Cell Biology, (2023), 25, 8, (1089-1100), 10.1038/s41556-023-01194-w)
Correction to: Nature Cell Biologyhttps://doi.org/10.1038/s41556-023-01194-w. Published online 19 July 2023. In the version of this article originally published, the name of Tianyang Xu was misspelled as Tiangyang Xu. The name has been corrected in the HTML and PDF versions of the article.
Anti-cancer therapies based on synthetic lethality (SL) exploit tumour vulnerabilities for treatment with reduced side effects, by targeting a gene that is jointly essential with another whose function is lost. Computational prediction is key to expedite SL screening, yet existing methods are vulnerable to prevalent selection bias in SL data and reliant on cancer or tissue type-specific omics, which can be scarce. Notably, sequence similarity remains underexplored as a proxy for related gene function and joint essentiality.
Results
We propose ELISL, Early–Late Integrated SL prediction with forest ensembles, using context-free protein sequence embeddings and context-specific omics from cell lines and tissue. Across eight cancer types, ELISL showed superior robustness to selection bias and recovery of known SL genes, as well as promising cross-cancer predictions. Co-occurring mutations in a BRCA gene and ELISL-predicted pairs from the HH, FGF, WNT, or NEIL gene families were associated with longer patient survival times, revealing therapeutic potential. ...
Anti-cancer therapies based on synthetic lethality (SL) exploit tumour vulnerabilities for treatment with reduced side effects, by targeting a gene that is jointly essential with another whose function is lost. Computational prediction is key to expedite SL screening, yet existing methods are vulnerable to prevalent selection bias in SL data and reliant on cancer or tissue type-specific omics, which can be scarce. Notably, sequence similarity remains underexplored as a proxy for related gene function and joint essentiality.
Results
We propose ELISL, Early–Late Integrated SL prediction with forest ensembles, using context-free protein sequence embeddings and context-specific omics from cell lines and tissue. Across eight cancer types, ELISL showed superior robustness to selection bias and recovery of known SL genes, as well as promising cross-cancer predictions. Co-occurring mutations in a BRCA gene and ELISL-predicted pairs from the HH, FGF, WNT, or NEIL gene families were associated with longer patient survival times, revealing therapeutic potential.
The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.
Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data.
Results
We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples.
Availability and implementation
https://github.com/joanagoncalveslab/sbsl
Supplementary information
Supplementary data are available at Bioinformatics online. ...
Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data.
Results
We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples.
Availability and implementation
https://github.com/joanagoncalveslab/sbsl
Supplementary information
Supplementary data are available at Bioinformatics online.
Understanding the impact of guide RNA (gRNA) and genomic locus on CRISPR-Cas9 activity is crucial to design effective gene editing assays. However, it is challenging to profile Cas9 activity in the endogenous cellular environment. Here we leverage our TRIP technology to integrate ~ 1k barcoded reporter genes in the genomes of mouse embryonic stem cells. We target the integrated reporters (IRs) using RNA-guided Cas9 and characterize induced mutations by sequencing. We report that gRNA-sequence and IR locus explain most variation in mutation efficiency. Predominant insertions of a gRNA-specific nucleotide are consistent with template-dependent repair of staggered DNA ends with 1-bp 5′ overhangs. We confirm that such staggered ends are induced by Cas9 in mouse pre-B cells. To explain observed insertions, we propose a model generating primarily blunt and occasionally staggered DNA ends. Mutation patterns indicate that gRNA-sequence controls the fraction of staggered ends, which could be used to optimize Cas9-based insertion efficiency.
Understanding the relationship between diseases based on the underlying biological mechanisms is one of the greatest challenges in modern biology and medicine. Exploring disease-disease associations by using system-level biological data is expected to improve our current knowledge of disease relationships, which may lead to further improvements in disease diagnosis, prognosis and treatment.
Results
We took advantage of diverse biological data including disease-gene associations and a large-scale molecular network to gain novel insights into disease relationships. We analysed and compared four publicly available disease-gene association datasets, then applied three disease similarity measures, namely annotation-based measure, function-based measure and topology-based measure, to estimate the similarity scores between diseases. We systematically evaluated disease associations obtained by these measures against a statistical measure of comorbidity which was derived from a large number of medical patient records. Our results show that the correlation between our similarity measures and comorbidity scores is substantially higher than expected at random, confirming that our similarity measures are able to recover comorbidity associations. We also demonstrated that our predicted disease associations correlated with disease associations generated from genome-wide association studies significantly higher than expected at random. Furthermore, we evaluated our predicted disease associations via mining the literature on PubMed, and presented case studies to demonstrate how these novel disease associations can be used to enhance our current knowledge of disease relationships.
Conclusions
We present three similarity measures for predicting disease associations. The strong correlation between our predictions and known disease associations demonstrates the ability of our measures to provide novel insights into disease relationships.
...
Understanding the relationship between diseases based on the underlying biological mechanisms is one of the greatest challenges in modern biology and medicine. Exploring disease-disease associations by using system-level biological data is expected to improve our current knowledge of disease relationships, which may lead to further improvements in disease diagnosis, prognosis and treatment.
Results
We took advantage of diverse biological data including disease-gene associations and a large-scale molecular network to gain novel insights into disease relationships. We analysed and compared four publicly available disease-gene association datasets, then applied three disease similarity measures, namely annotation-based measure, function-based measure and topology-based measure, to estimate the similarity scores between diseases. We systematically evaluated disease associations obtained by these measures against a statistical measure of comorbidity which was derived from a large number of medical patient records. Our results show that the correlation between our similarity measures and comorbidity scores is substantially higher than expected at random, confirming that our similarity measures are able to recover comorbidity associations. We also demonstrated that our predicted disease associations correlated with disease associations generated from genome-wide association studies significantly higher than expected at random. Furthermore, we evaluated our predicted disease associations via mining the literature on PubMed, and presented case studies to demonstrate how these novel disease associations can be used to enhance our current knowledge of disease relationships.
Conclusions
We present three similarity measures for predicting disease associations. The strong correlation between our predictions and known disease associations demonstrates the ability of our measures to provide novel insights into disease relationships.
The YEASTRACT database
An upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae
LateBiclustering
Efficient Heuristic Algorithm for Time-Lagged Bicluster Identification
Identifying patterns in temporal data supports complex analyses in several domains, including stock markets (finance) and social interactions (social science). Clinical and biological applications, such as monitoring patient response to treatment or characterizing activity at the molecular level, are also of interest. In particular, researchers seek to gain insight into the dynamics of biological processes, and potential perturbations of these leading to disease, through the discovery of patterns in time series gene expression data. For many years, clustering has remained the standard technique to group genes exhibiting similar response profiles. However, clustering defines similarity across all time points, focusing on global patterns which tend to characterize rather broad and unspecific responses. It is widely believed that local patterns offer additional insight into the underlying intricate events leading to the overall observed behavior. Efficient biclustering algorithms have been devised for the discovery of temporally aligned local patterns in gene expression time series, but the extraction of time-lagged patterns remains a challenge due to the combinatorial explosion of pattern occurrence combinations when delays are considered. We present heuristic approaches enabling polynomial rather than exponential time solutions for the problem.
Regulatory Snapshots
Integrative Mining of Regulatory Modules from Expression Time Series and Regulatory Networks
AliBiMotif
Integrating alignment and biclustering to unravel Transcription Factor Binding Sites in DNA sequences