A.M.E.T.A. Mahfouz | TU Delft Repository

Host-microbial interactions at the nasal mucosa in young children and adults

A retrospective, cross-sectional study

Journal article (2026) - Jesús Reiné, Lisa A. King, Youvika Singh, Wouter A.A. de Steenhuijsen Piters, Beatriz F. Carniel, Carla Solórzano, H.H. Smits, Elissavet Nikolaou, Ahmed Mahfouz, More Authors

Young children are at increased risk for respiratory tract infections and are frequently colonized by respiratory pathogens. However, how the mucosal immune system differs between children and adults is relatively unknown. We collected nasal samples from 50 young children (aged 1–5 years) and 318 young adults (aged 18–34 years) to study how the mucosal immune system and host-microbe interactions differ with age. We used multi-omics data integration to combine host (immunophenotyping, transcriptomic, and cytokines) and microbial (16S-rRNA amplicon sequencing, viral PCRs, and pneumococcal culture) datasets. Young children had a paucity of mucosal granulocytes, while B and T cell subsets were increased. Children also had increased immune activation and inflammation, which associated with the presence of Haemophilus spp. and pneumococcus, but not viruses. In adults, Haemophilus spp. associated with T cell and monocyte recruitment, while Dolosigranulum negatively associated with neutrophil degranulation. Thus, nasal immune composition and host-pathogen interactions were clearly age dependent. ...

FOXP1 is differentially active during development of murine vasopressin and oxytocin magnocellular neurons

Journal article (2026) - Jari B. Berkhout, Sophie Trender, Ferdinand Althammer, Onno C. Meijer, Ahmed Mahfouz, Quirin Krabichler, Yuval Podpecan, Felix Franke, Tim Schubert, Peter Burbach, Valery Grinevich, Roger Adan, Henning Fröhlich

Hypothalamic arginine vasopressin (AVP) and oxytocin (OXT) magnocellular neurons (MCNs), share a developmental lineage. The transcription factors driving specification are yet unknown. Using gene regulatory network analysis on published single-cell RNA-sequencing data of the developing mouse hypothalamus, we identified RORA, EBF3, FOXP1, FOXP2, and BCL11B as candidate transcription factors for differential MCN specification. We modeled developmental gene expression dynamics using computational cell fate mapping, revealing enrichment of EBF3 and BCL11B in the Avp lineage, and FOXP1 and FOXP2 in the Oxt lineage. In silico analysis of Avp and Oxt promoters predicted a binding site for FOXP1 and FOXP2, and an in vitro reporter assay identified regulation on both Avp and Oxt genomic promoters. Finally, heterozygous FOXP1 knockout mice exhibited a significant reduction in AVP and OXT neuron abundance, with OXT neurons disproportionally affected. We conclude that FOXP1 participates in MCN development, while being differentially active in OXT MCNs relative to AVP MCNs. ...

Novel genes associated with hypocretin-producing neurons identified by human gene expression profiling

Journal article (2026) - Marieke Vringer, Ahmed Mahfouz, Maartje G. Huijbers, Gert Jan Lammers, Jari Berkhout, Frits Koning, Rolf Fronczek, Mink Schinkelshoek

Narcolepsy type 1 is a sleep-wake disorder characterized by hypocretin deficiency. It has been considered an autoimmune disorder for decades due to the strong associating with the HLA-DQB1*06:02 allele and possible relations to the H1N1 pandemic in 2009. However, the pathophysiological mechanisms underlying the loss of hypocretin neurons is not understood. We hypothesize that a hypocretin neuron-specific antigen, other than hypocretin itself but sharing an expression pattern, may be the target of the autoimmune response leading to the development in individuals with narcolepsy type 1. In this study, we employed an in silico method to identify novel candidate antigens for an autoimmune response leading to the destruction of hypocretin cells. A combination of multiple publicly available datasets, based on human brain tissue from healthy individuals, was used to map the expression profile of hypocretin. Genes were categorized based on their expression pattern and its association with hypocretin expression. 15 candidate genes were identified as potentially relevant targets in the development of NT1, with varying degrees of confidence regarding the likelihood of their involvement. Six candidate genes also showed higher expression within hypocretin cells compared to other cells in the hypothalamus of which NPVF seems most promising. This study provides important new directions and potential targets for investigating and understanding the pathophysiology of narcolepsy type 1. ...

Unraveling the spatial landscape of dystrophinopathies

A transcriptomic approach to Becker and Duchenne muscular dystrophies

Journal article (2026) - Laura G.M. Heezen, Qirong Mao, Maaike van Putten, Annemieke Aartsma-Rus, Kevin M. Flanigan, Ahmed Mahfouz, Pietro Spitali, Stefan Nicolau, Claudio Novella Rausell, Julia van der Weerd, Jan Kueckelhaus, Rasya Gokul Nath, Jordi Diaz Manera, Hermien E. Kan, Erik H. Niks

Dystrophinopathies are caused by pathogenic variants in the DMD gene, resulting in partial (Becker) or complete loss (Duchenne) of dystrophin. Becker (BMD) and Duchenne muscular dystrophy (DMD) are characterized by progressive muscle wasting, fatty replacement, fibrosis, and loss of function. To study histopathological changes, we used Visium spatial transcriptomics to profile skeletal muscle biopsies of patients affected by dystrophinopathy (n = 8) and healthy controls (n = 4). We estimated the proportion of cell types and their spatial localization across samples applying a deconvolution strategy using previously published single-nucleus RNA-sequencing data. We identified genes enriched in fat patches and cell types such as fibroadipogenic progenitors (FAPs) in areas of active pathology. Using expression data of ligand–receptor pairs, we highlight cell–cell communications leading to fibrotic and adipogenic lesions. Finally, analysis of gene expression gradients in areas of adjacent muscle and fat, allowed the identification of genes associated with muscle areas committed to becoming fat. ...

Decoding exon inclusion in the human brain reveals more divergent splicing mechanisms in neurons than glia

Journal article (2026) - Lieke Michielsen, Justine Hsu, Anoushka Joglekar, Natan Belchikov, Marcel J.T. Reinders, Hagen U. Tilgner, Ahmed Mahfouz

BACKGROUND: Alternative splicing contributes to molecular diversity across brain cell types. RNA-binding proteins (RBPs) regulate splicing, but the genome-wide mechanisms underlying cell-type-specific splicing remain poorly understood. RESULTS: Here, we want to unravel cell-type-specific splicing mechanisms by using RBP binding sites and/or the genomic sequence to predict exon inclusion in neurons and glia as measured by long-read single-cell data in the human hippocampus and frontal cortex. We found that exon inclusion of variable exons is harder to predict in neurons compared to glia in both brain regions. Comparing neurons and glia, the position of RBP binding sites in alternatively spliced exons in neurons differ more from non-variable exons indicating distinct splicing mechanisms. Model interpretation pinpointed RBPs, including QKI, potentially regulating alternative splicing between neurons and glia. Finally, we accurately predict and prioritize the effect of splicing QTLs. CONCLUSIONS: Our results indicate that the splicing mechanisms in variable exons in neurons diverged more from the standard mechanisms. Splicing in neurons might be less sequence-dependent and influenced more by, for instance, chromatin accessibility or methylation. Taken together, these results highlight new insights into the mechanisms regulating cell-type-specific alternative splicing in the brain. ...

Optimized summary-statistic-based single-cell eQTL meta-analysis

Journal article (2025) - Maryna Korshevniuk, Harm Jan Westra, Roy Oelen, Monique G.P. van der Wijst, Lude Franke, Marc Jan Bonder, Marc Jan Bonder, L.C.M. Michielsen, Ahmed Mahfouz, More Authors

The identification of expression quantitative trait loci (eQTLs) holds great potential to improve the interpretation of disease-associated genetic variation. As many such disease-associated variants act in a context-, tissue- or even cell-type-specific manner, single-cell RNA-sequencing (scRNA-seq) data is uniquely suitable for identifying the specific cell type or context in which these genetic variants act. However, due to the limited sample sizes in single-cell studies, discovery of cell-type-specific eQTLs is now limited. To improve power to detect such eQTLs, large-scale joint analyses are needed. These are however, complicated by privacy constraints due to sharing of genotype data and the measurement and technical variety across different scRNA-seq datasets as a result of differences in mRNA capture efficiency, experimental protocols, and sequencing strategies. A solution to these issues is a federated weighted meta-analysis (WMA) approach in which summary statistics are integrated using dataset-specific weights. Here, we compare different strategies and provide best practice recommendations for eQTL WMA across scRNA-seq datasets. ...

Activated CD27⁺PD-1⁺ CD8 T Cells and CD4 T Regulatory Cells Dominate the Tumor Microenvironment in Refractory Celiac Disease Type II

Journal article (2025) - Tessa Dieckman, Mette Schreurs, Ciska Lindelauf, Ahmed Mahfouz, Caroline R. Meijer, Louise Pigeaud, Vincent van Unen, Gerd Bouma, Frits Koning

Background and Aims: Refractory celiac disease type II (RCDII) is characterized by a clonally expanded aberrant cell population in the small intestine. The role of other tissue-resident immune subsets in RCDII is unknown. Here, we characterized CD8 and CD4 T cells in RCDII duodenum at the single-cell level and in situ. Methods: We applied mass cytometry on CD45⁺ duodenal cells derived from intestinal biopsies (n = 23) and blood samples (n = 20) from RCDII patients and controls. Additionally, we analyzed intestinal biopsies from celiac disease (n = 11) and RCDI (n = 2) patients. We performed single-cell RNA-sequencing on CD45⁺ duodenal cells derived from a RCDII patient, immunofluorescence staining for in situ analysis and flow cytometry for phenotyping of RCDII aberrant and CD8 T cells. Results: Compared to healthy controls, we observed that CD27⁺PD-1⁺ memory CD8αβ cells and CD4 T regulatories (Tregs) were more abundant in RCDII duodenum (CD8 ∗∗0.0029; CD4 ∗∗∗0.0001). The CD27⁺PD-1⁺ memory CD8αβ cells expressed the tissue-resident marker CD69, immunoregulatory markers (TIGIT, HAVCR2, TNFRSF9), NKG2A, were enriched for activated pathways and displayed cytotoxic gene signatures (NKG7, PRF1, GZMA). The absence of CD103 accords with their localization in the lamina propria as determined by in situ analysis. The CD25⁺FoxP3⁺CD27⁺CD127^dim/- CD4 Tregs expressed IL1R2 and IL32 and costimulatory molecules (TNFSRS4, ICOS and TNFRSF18) and resided in the lamina propria as well. Flow cytometry confirmed the presence of the inhibitory receptor NKG2A on expanded duodenal CD8 T cells and HLA-E, the ligand for NKG2A, on expanded aberrant cells. Conclusion: RCDII is characterized by the simultaneous presence of an activated CD27⁺PD-1⁺ memory CD8αβ T cell subset and CD4 Tregs, suggesting that checkpoint blockade with anti-NKG2A/PD-1 and/or anticytotoxic T lymphocyte antigen 4 may be an attractive treatment option. ...

DUX4 activates common and context-specific intergenic transcripts and isoforms

Journal article (2025) - Dongxu Zheng, Anita van den Heuvel, Judit Balog, Iris M. Willemsen, Susan Kloet, Stephen J. Tapscott, Ahmed Mahfouz, Silvère M. van der Maarel

DUX4 regulates the expression of genic and nongenic elements and modulates chromatin accessibility during zygotic genome activation in cleavage stage embryos. Its misexpression in skeletal muscle causes facioscapulohumeral dystrophy (FSHD). By leveraging full-length RNA isoform sequencing with short-read RNA sequencing of DUX4-inducible myoblasts, we elucidate an isoform-resolved transcriptome featuring numerous unannotated isoforms from known loci and novel intergenic loci. While DUX4 activates similar programs in early embryos and FSHD muscle, the isoform usage of known DUX4 targets is notably distinct between the two contexts. DUX4 also activates hundreds of previously unannotated intergenic loci dominated by repetitive elements. The transcriptional and epigenetic profiles of these loci in myogenic and embryonic contexts indicate that the usage of DUX4-binding sites at these intergenic loci is influenced by the cellular environment. These findings demonstrate that DUX4 induces context-specific transcriptomic programs, enriching our understanding of DUX4-induced muscle pathology. ...

Selective changes in vasopressin neurons and astrocytes in the suprachiasmatic nucleus of Prader–Willi syndrome subjects

Journal article (2025) - Felipe Correa-da-Silva, Jari B. Berkhout, Pim Schouten, Margje Sinnema, Constance T.R.M. Stumpel, Leopold M.G. Curfs, Charlotte Höybye, Ahmed Mahfouz, Onno C. Meijer, More Authors...

The hypothalamic suprachiasmatic nucleus (SCN) hosts the central circadian pacemaker and regulates daily rhythms in physiology and behavior. The SCN is composed of peptidergic neuron populations expressing arginine vasopressin (AVP) and vasoactive intestinal polypeptide (VIP), as well as glial cells. Patients with Prader–Willi Syndrome (PWS) commonly experience circadian disturbances, which are particularly evident in their sleep/wake patterns. Using publicly available single-cell RNA sequencing data, we assessed the cell-type specificity of PWS-causative genes in murine SCN, which revealed the differential presence of PWS-related genes in glial and neural subpopulations. We then investigated neurons and glial cells in the SCN using immunohistochemistry in the postmortem hypothalami of PWS subjects and matched controls. We profiled neural populations characterized by AVP and VIP, astroglia characterized by glial fibrillary acid protein (GFAP), and microglia marked by ionized calcium-binding adapter molecule 1 (Iba1) and NADPH oxidase 2 (NOX2). Our analysis revealed an increased total number, neuronal density, and relative staining intensity of AVP-containing neurons in the PWS compared to controls while VIP-containing cells were unaltered. In contrast, GFAP-expressing astroglial cells were significantly lower in PWS subjects. Moreover, we did not detect any differences in microglia between PWS subjects and controls. Collectively, our findings show that PWS selectively affects AVP-containing neurons and GFAP-expressing astrocytes in the SCN. As each of these cell populations can affect the daily rhythmicity of the SCN biological clock machinery, the disruption of these cells may contribute to the circadian disturbances in patients with PWS. ...

SIRV

Spatial inference of RNA velocity at the single-cell resolution

Journal article (2024) - Tamim Abdelaal, Laurens M. Grossouw, R. Jeroen Pasterkamp, Boudewijn P.F. Lelieveldt, Marcel J.T. Reinders, Ahmed Mahfouz

RNA Velocity allows the inference of cellular differentiation trajectories from single-cell RNA sequencing (scRNA-seq) data. It would be highly interesting to study these differentiation dynamics in the spatial context of tissues. Estimating spatial RNA velocities is, however, limited by the inability to spatially capture spliced and unspliced mRNA molecules in high-resolution spatial transcriptomics. We present SIRV, a method to spatially infer RNA velocities at the single-cell resolution by enriching spatial transcriptomics data with the expression of spliced and unspliced mRNA from reference scRNA-seq data. We used SIRV to infer spatial differentiation trajectories in the developing mouse brain, including the differentiation of midbrain-hindbrain boundary cells and marking the forebrain origin of the cortical hem and diencephalon cells. Our results show that SIRV reveals spatial differentiation patterns not identifiable with scRNA-seq data alone. Additionally, we applied SIRV to mouse organogenesis data and obtained robust spatial differentiation trajectories. Finally, we verified the spatial RNA velocities obtained by SIRV using 10x Visium data of the developing chicken heart and MERFISH data from human osteosarcoma cells. Altogether, SIRV allows the inference of spatial RNA velocities at the single-cell resolution to facilitate studying tissue development. ...

Predicting cell population-specific gene expression from genomic sequence

Journal article (2024) - Lieke Michielsen, Marcel J. T. Reinders, Ahmed Mahfouz

Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements. ...

snRNA-seq analysis in multinucleated myogenic FSHD cells identifies heterogeneous FSHD transcriptome signatures associated with embryonic-like program activation and oxidative stress-induced apoptosis

Journal article (2024) - Dongxu Zheng, Annelot Wondergem, Susan Kloet, Iris Willemsen, Judit Balog, Stephen J. Tapscott, Ahmed Mahfouz, Anita Van Den Heuvel, Silvère M. Van Der Maarel

The sporadic nature of DUX4 expression in FSHD muscle challenges comparative transcriptome analyses between FSHD and control samples. A variety of DUX4 and FSHD-associated transcriptional changes have been identified, but bulk RNA-seq strategies prohibit comprehensive analysis of their spatiotemporal relation, interdependence and role in the disease process. In this study, we used single-nucleus RNA-sequencing of nuclei isolated from patient- and control-derived multinucleated primary myotubes to investigate the cellular heterogeneity in FSHD. Taking advantage of the increased resolution in snRNA-sequencing of fully differentiated myotubes, two distinct populations of DUX4-affected nuclei could be defined by their transcriptional profiles. Our data provides insights into the differences between these two populations and suggests heterogeneity in two well-known FSHD-associated transcriptional aberrations: increased oxidative stress and inhibition of myogenic differentiation. Additionally, we provide evidence that DUX4-affected nuclei share transcriptome features with early embryonic cells beyond the well-described cleavage stage, progressing into the 8-cell and blastocyst stages. Altogether, our data suggests that the FSHD transcriptional profile is defined by a mixture of individual and sometimes mutually exclusive DUX4-induced responses and cellular state-dependent downstream effects. ...

Combined plasma protein and memory T cell profiling discern IBD-patient-immunotypes related to intestinal disease and treatment outcomes

Journal article (2024) - Maud Heredia, Mohammed Charrout, Renz C.W. Klomberg, Martine A. Aardoom, Maria M.E. Jongsma, Polychronis Kemos, Danielle H. Hulleman-van Haaften, Ahmed Mahfouz, Marcel J.T. Reinders, More authors...

Inflammatory bowel disease (IBD) chronicity results from memory T helper cell (Tmem) reactivation. Identifying patient-specific immunotypes is crucial for tailored treatment. We conducted a comprehensive study integrating circulating immune proteins and circulating Tmem, with intestinal tissue histology and mRNA analysis, in therapy-naïve pediatric IBD (Crohn's disease, CD: n = 62; ulcerative colitis, UC: n = 20; age-matched controls n = 43), and after 10–12 weeks’ induction therapy. At diagnosis, plasma protein profiles unveiled two UC and three CD clusters with distinct disease courses. UC patients displayed unchanged circulating Tmem, while CD exhibited increased frequencies of gut-homing ex-Th17, known for high IFN-γ production. UC#2 had elevated Th17/neutrophil-pathway-related proteins and severe disease, with higher endoscopic and histological damage and Th17/neutrophil infiltration. Although both UC#1 and UC#2 responded to therapy, UC#2 required earlier immunomodulation. CD#3 had lower plasma protein concentrations, especially IFN-γ pathway proteins, fewer gut-homing ex-Th17 and clinically milder disease, confirmed by intestinal gene expression. CD#1 and CD#2 had comparably high Th1-related immune profiles, but CD#1 exhibited higher concentrations of proteins previously associated with poorer prognosis. Both CD clusters responded to induction therapy, with similar one-year outcomes. This study highlights feasibility of discriminating patient-specific immunotypes in IBD, advancing our understanding of immune pathogenesis, needed for tailored treatment strategies. ...

Integration of mass cytometry and mass spectrometry imaging for spatially resolved single-cell metabolic profiling

Journal article (2024) - Joana B. Nunes, Marieke E. Ijsselsteijn, Tamim Abdelaal, Rick Ursem, Manon van der Ploeg, Martin Giera, Bart Everts, Ahmed Mahfouz, Bram Heijs, Noel F.C.C. de Miranda

The integration of spatial omics technologies can provide important insights into the biology of tissues. Here we combined mass spectrometry imaging-based metabolomics and imaging mass cytometry-based immunophenotyping on a single tissue section to reveal metabolic heterogeneity at single-cell resolution within tissues and its association with specific cell populations such as cancer cells or immune cells. This approach has the potential to greatly increase our understanding of tissue-level interplay between metabolic processes and their cellular components. ...

An integrated single-cell RNA-seq atlas of the mouse hypothalamic paraventricular nucleus links transcriptomic and functional types

Journal article (2024) - J. B. Berkhout, D. Poormoghadam, C. Yi, A. Kalsbeek, O. C. Meijer, A. Mahfouz

The hypothalamic paraventricular nucleus (PVN) is a highly complex brain region that is crucial for homeostatic regulation through neuroendocrine signaling, outflow of the autonomic nervous system, and projections to other brain areas. In the past years, single-cell datasets of the hypothalamus have contributed immensely to the current understanding of the diverse hypothalamic cellular composition. While the PVN has been adequately classified functionally, its molecular classification is currently still insufficient. To address this, we created a detailed atlas of PVN transcriptomic cell types by integrating various PVN single-cell datasets into a recently published hypothalamus single-cell transcriptome atlas. Furthermore, we functionally profiled transcriptomic cell types, based on relevant literature, existing retrograde tracing data, and existing single-cell data of a PVN-projection target region. Finally, we validated our findings with immunofluorescent stainings. In our PVN atlas dataset, we identify the well-known different neuropeptide types, each composed of multiple novel subtypes. We identify Avp-Tac1, Avp-Th, Oxt-Foxp1, Crh-Nr3c1, and Trh-Nfib as the most important neuroendocrine subtypes based on markers described in literature. To characterize the preautonomic functional population, we integrated a single-cell retrograde tracing study of spinally projecting preautonomic neurons into our PVN atlas. We identify these (presympathetic) neurons to cocluster with the Adarb2⁺ clusters in our dataset. Further, we identify the expression of receptors for Crh, Oxt, Penk, Sst, and Trh in the dorsal motor nucleus of the vagus, a key region that the pre-parasympathetic PVN neurons project to. Finally, we identify Trh-Ucn3 and Brs3-Adarb2 as some centrally projecting populations. In conclusion, our study presents a detailed overview of the transcriptomic cell types of the murine PVN and provides a first attempt to resolve functionality for the identified populations. ...

The hypothalamic paraventricular nucleus (PVN) is a highly complex brain region that is crucial for homeostatic regulation through neuroendocrine signaling, outflow of the autonomic nervous system, and projections to other brain areas. In the past years, single-cell datasets of the hypothalamus have contributed immensely to the current understanding of the diverse hypothalamic cellular composition. While the PVN has been adequately classified functionally, its molecular classification is currently still insufficient. To address this, we created a detailed atlas of PVN transcriptomic cell types by integrating various PVN single-cell datasets into a recently published hypothalamus single-cell transcriptome atlas. Furthermore, we functionally profiled transcriptomic cell types, based on relevant literature, existing retrograde tracing data, and existing single-cell data of a PVN-projection target region. Finally, we validated our findings with immunofluorescent stainings. In our PVN atlas dataset, we identify the well-known different neuropeptide types, each composed of multiple novel subtypes. We identify Avp-Tac1, Avp-Th, Oxt-Foxp1, Crh-Nr3c1, and Trh-Nfib as the most important neuroendocrine subtypes based on markers described in literature. To characterize the preautonomic functional population, we integrated a single-cell retrograde tracing study of spinally projecting preautonomic neurons into our PVN atlas. We identify these (presympathetic) neurons to cocluster with the Adarb2⁺ clusters in our dataset. Further, we identify the expression of receptors for Crh, Oxt, Penk, Sst, and Trh in the dorsal motor nucleus of the vagus, a key region that the pre-parasympathetic PVN neurons project to. Finally, we identify Trh-Ucn3 and Brs3-Adarb2 as some centrally projecting populations. In conclusion, our study presents a detailed overview of the transcriptomic cell types of the murine PVN and provides a first attempt to resolve functionality for the identified populations.

Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms

Journal article (2024) - Adam Tisch, Siddharth Madapoosi, Stephen Blough, Jan Rosa, Sean Eddy, Laura Mariani, Abhijit Naik, Ahmed Mahfouz, Fadhl Alakwaa, More authors...

Introduction
Single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) provide valuable insights into the cellular states of kidney cells. However, the annotation of cell types often requires extensive domain expertise and time-consuming manual curation, limiting scalability and generalizability. To facilitate this process, we tested the performance of five supervised classification methods for automatic cell type annotation.

Results
We analyzed publicly available sc/snRNA-seq datasets from five expert-annotated studies, comprising 62,120 cells from 79 kidney biopsy samples. Datasets were integrated by harmonizing cell type annotations across studies. Five different supervised machine learning algorithms (support vector machines, random forests, multilayer perceptrons, k-nearest neighbors, and extreme gradient boosting) were applied to automatically annotate cell types using four training datasets and one testing dataset. Performance metrics, including accuracy (F1 score) and rejection rates, were evaluated. All five machine learning algorithms demonstrated high accuracies, with a median F1 score of 0.94 and a median rejection rate of 1.8 %. The algorithms performed equally well across different datasets and successfully rejected cell types that were not present in the training data. However, F1 scores were lower when models trained primarily on scRNA-seq data were tested on snRNA-seq data.

Conclusions
Despite limitations including the number of biopsy samples, our findings demonstrate that machine learning algorithms can accurately annotate a wide range of adult kidney cell types in scRNA-seq/snRNA-seq data. This approach has the potential to standardize cell type annotation and facilitate further research on cellular mechanisms underlying kidney disease. ...

Cell type matching across species using protein embeddings and transfer learning

Journal article (2023) - Kirti Biharie, Lieke Michielsen, Marcel J.T. Reinders, Ahmed Mahfouz

Motivation: Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. Results: In this work, we present a model to transfer and align cell types in cross-species analysis (TACTiCS). First, TACTiCS uses a natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterward, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex of human, mouse, and marmoset. Our model can accurately match and align cell types on these datasets. Moreover, our model outperforms Seurat and the state-of-the-art method SAMap. Finally, we show that our gene matching method results in better cell type matches than BLAST in our model. ...

Spatial transcriptomics reveal markers of histopathological changes in Duchenne muscular dystrophy mouse models

Journal article (2023) - L. G.M. Heezen, T. Abdelaal, M. van Putten, A. Aartsma-Rus, A. Mahfouz, P. Spitali

Duchenne muscular dystrophy is caused by mutations in the DMD gene, leading to lack of dystrophin. Chronic muscle damage eventually leads to histological alterations in skeletal muscles. The identification of genes and cell types driving tissue remodeling is a key step to developing effective therapies. Here we use spatial transcriptomics in two Duchenne muscular dystrophy mouse models differing in disease severity to identify gene expression signatures underlying skeletal muscle pathology and to directly link gene expression to muscle histology. We perform deconvolution analysis to identify cell types contributing to histological alterations. We show increased expression of specific genes in areas of muscle regeneration (Myl4, Sparc, Hspg2), fibrosis (Vim, Fn1, Thbs4) and calcification (Bgn, Ctsk, Spp1). These findings are confirmed by smFISH. Finally, we use differentiation dynamic analysis in the D2-mdx muscle to identify muscle fibers in the present state that are predicted to become affected in the future state. ...

A comprehensive mouse kidney atlas enables rare cell population characterization and robust marker discovery

Journal article (2023) - Claudio Novella-Rausell, Magda Grudniewska, Dorien J.M. Peters, Ahmed Mahfouz

The kidney's cellular diversity is on par with its physiological intricacy; yet identifying cell populations and their markers remains challenging. Here, we created a comprehensive atlas of the healthy adult mouse kidney (MKA: Mouse Kidney Atlas) by integrating 140.000 cells and nuclei from 59 publicly available single-cell and single-nuclei RNA-sequencing datasets from eight independent studies. To harmonize annotations across datasets, we built a hierarchical model of the cell populations. Our model allows the incorporation of novel cell populations and the refinement of known profiles as more datasets become available. Using MKA and the learned model of cellular hierarchies, we predicted previously missing cell annotations from several studies. The MKA allowed us to identify reproducible markers across studies for poorly understood cell types and transitional states, which we verified using existing data from micro-dissected samples and spatial transcriptomics. ...

Benchmarking variational AutoEncoders on cancer transcriptomics data

Journal article (2023) - Mostafa Eltager, Tamim Abdelaal, Mohammed Charrout, A.M.E.T.A. Mahfouz, M.J.T. Reinders, Stavros Makrodimitris

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement. ...

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream tasks of cluster agreement with cancer subtypes and survival analysis. We studied the effect of the latent space dimensionality, learning rate, optimizer, initialization and activation function on the quality of subsequent downstream tasks on the TCGA samples. We found β-TCVAE and DIP-VAE to have a good performance, on average, despite being more sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. To ensure generalization, we tested all hyperparameter configurations on the GTEx dataset. We found a significant correlation (ρ = 0.7) between the hyperparameter effects on clustering performance in the TCGA and GTEx datasets. This highlights the robustness and generalizability of our recommendations. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we measured the correlation and mutual information of the different representations with various data characteristics such as gender, age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics nor capture separable information in the latent factors even for models specifically designed for disentanglement.