Improving cell type matching across species in scRNA-seq data using protein embeddings and transfer learning

Master Thesis (2022)
Author(s)

K.S. Biharie (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Marcel J.T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Ahmed Mahfouz – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Lieke C.M. Michielsen – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

E. Isufi – Graduation committee member (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Kirti Biharie
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Kirti Biharie
Graduation Date
06-07-2022
Awarding Institution
Delft University of Technology
Project
['Artificial Intelligence Technology']
Programme
['Computer Science | Bioinformatics']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species, is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. In this work, we present a model to Transfer and Align Cell Types in Cross-Species (TACTiCS). First, TACTiCS uses an natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterwards, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex and the ventral tegmental area. Our model can accurately match and align cell types on these datasets. Moreover, at a high resolution, our model outperforms two state-of-the-art methods, SAMap and CAME. Finally, we show that our gene matching method results in better matches than BLAST, both in our model and SAMap.

Files

MSc_Thesis_Kirti_Biharie.pdf
(pdf | 6.12 Mb)
- Embargo expired in 30-06-2023
License info not available