Improving cell type matching across species in scRNA-seq data using protein embeddings and transfer learning

More Info
expand_more

Abstract

Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species, is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. In this work, we present a model to Transfer and Align Cell Types in Cross-Species (TACTiCS). First, TACTiCS uses an natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterwards, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex and the ventral tegmental area. Our model can accurately match and align cell types on these datasets. Moreover, at a high resolution, our model outperforms two state-of-the-art methods, SAMap and CAME. Finally, we show that our gene matching method results in better matches than BLAST, both in our model and SAMap.