A.M.E.T.A. Mahfouz | TU Delft Repository

Computational Approaches to Deciphering the Molecular and Cellular Heterogeneity of Alzheimer's Disease

Doctoral thesis (2025) - G.A. Bouland, M.J.T. Reinders, A.M.E.T.A. Mahfouz

In summary, the contributions within this thesis advance Alzheimer's research by introducing new computational tools and methods to better understand the genetics of the disease and cellular mechanisms. Additionally, showing that single-cell gene expression can be effectively analyzed in a binary format (expressed or not) simplifies genomic data analysis, making it more accessible, efficient, and applicable to a range of diseases and conditions. ...

Improving cell type matching across species in scRNA-seq data using protein embeddings and transfer learning

Master thesis (2022) - K.S. Biharie, M.J.T. Reinders, A.M.E.T.A. Mahfouz, L.C.M. Michielsen, E. Isufi

Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species, is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. In this work, we present a model to Transfer and Align Cell Types in Cross-Species (TACTiCS). First, TACTiCS uses an natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterwards, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex and the ventral tegmental area. Our model can accurately match and align cell types on these datasets. Moreover, at a high resolution, our model outperforms two state-of-the-art methods, SAMap and CAME. Finally, we show that our gene matching method results in better matches than BLAST, both in our model and SAMap. ...

Predicting immune responses on multi-modal single-cell data with variational inference

Master thesis (2022) - F.K. Drummer, Ahmed Mahfouz, M.J.T. Reinders, T. Höllt

Single-cell sequencing allows measuring individual cells' molecular features and their responses to perturbations. Understanding which cells respond to a particular perturbation and how these responses vary across populations can be used to, for example, improve vaccine immunogenicity. However, an exhaustive exploration of single-cell perturbation responses in every population is usually experimentally unfeasible. Several machine learning models have been developed to predict perturbation responses, but they are limited to single-modality data. Single-modality data alone, such as only transcriptomics, is not suited to capture all cell responses accurately. For example, the identification of immune responses requires transcriptomic and proteomic measurements. Here, we introduce cellPMVI, a method built to predict perturbation responses from multi-modality data. cellPMVI combines the single-cell data modeling from scVI with a mixture-of-experts posterior integration to allow for multi-modality input data. In this work, we validate cellPMVI for immune response prediction of adjuvants across populations. The model is trained on two-modality CITE-seq data containing gene and protein measurements from three different populations. We show that cellPMVI can model both modalities of the CITE-seq data without information loss in either modality and predict immune responses with a high correlation to the observed responses across different populations. Hence, cellPMVI is the first model to capture and predict immune response for multi-modality data with the potential to be applied for other perturbations, such as drugs. ...

Comparing the performance of variant calling algorithms based on HiFi reads

Master thesis (2022) - M. Mikuš, Ahmed Mahfouz

Unsupervised Manifold Alignment with TopoGAN

Aligning multi-modal biological data without correspondence information available across modalities

Master thesis (2021) - A. Singh, A.M.E.T.A. Mahfouz, M.J.T. Reinders, C. Lofi, T.R.M. Abdelaal

Single-cell multi-modal omics promises to open new doors in bioinformatics by measuring different aspects of cells, thus offering multiple perspectives on the underlying biological phenomenon. Although simultaneous multi-modal measurement protocols do exist, their inherent technical limitations necessitate focus on single modality measurements. These single modality measurements, however, destroy the cell in question, thus making simultaneous measurements impossible. This gives rise to a great availability of multi-modal biological data with no inter-data set sample/feature correspondence. This work proposes a novel approach to align multi-modal data sets in an unsupervised fashion using an Autoencoder to obtain latent embeddings of the modalities and a Generative Adversarial Network to align these latent representations. Minimising the topological error between the original and latent representations of a data set is central to this approach which enables not just the superposition but also alignment of different modalities. Two recently published methods, UnionCom and MMD-MA, have been used for comparison and benchmarking. The approach, termed TopoGAN, has been demonstrated to give consistently stable alignments, give better quantitative performance in realistic unsupervised settings, and scale much better in terms of memory requirements as compared to these state-of-the-art methods. ...

Automatic cell identification in single-cell RNA-sequencing data

Master thesis (2020) - Lieke Michielsen, Marcel Reinders, Ahmed Mahfouz

Since the revolution of single-cell RNA-sequencing, the number of available datasets has increased enormously. In these datasets, cell identification is mainly done manually, which is subjective and time-consuming. As a consequence, most datasets are annotated at a different resolution. This is not surprising as cell types form a hierarchy, but it can be problematic for downstream analysis or comparison of datasets. Several supervised methods have already been developed to overcome the drawbacks of unsupervised learning. None of these, however, combines the information found in multiple datasets and preserves the definition of cell populations in each dataset, while this consistency is necessary for downstream analysis. Furthermore, a supervised classifier should be able to detect new cell populations in an unlabeled dataset. Here, we introduce a hierarchical progressive learning pipeline with a one-class classifier to face these challenges. Using this pipeline, it is possible to construct a hierarchical classification tree by combining the information of multiple datasets. If datasets are annotated at a different resolution, their cell populations will be at different levels in the tree and all definitions are thus preserved. By using a one-class classifier for each cell population it is also possible to have a correctly working rejection option and discover new cell populations. In this paper, we show that it is possible to construct a classification tree for simulated data and immune cells. When comparing the pipeline with a one-class to a linear classifier, we show that a one-class classifier can indeed improve the rejection option. Using a linear classifier, on the other hand, results in a higher accuracy. Choosing between a one-class and a linear classifier is a trade-off between the ability of discovering new cell populations and a higher performance. ...