Attention on Genes

Unveiling Key Genes For Cancer Cell-state Predictions of the Geneformer Model by Inspecting the Attention Weights

More Info
expand_more

Abstract

Geneformer is a transformer which is pretrained on Geneformer-30M, a dataset consisting of 29.9 million healthy cells. This paper focuses on how Geneformer shifts its attention, when fine-tuned on a dataset of cancer cells, whose gene expression is expected to be distinct, and which genes are key when making cell-state predictions within such an environment. In this paper we compare the shift in attention, and which genes receive the most attention, in a weight-based analysis.

The observed shift in attention was significant, however the accuracy of the prediction increased minimally. The attention weights were mapped back to individual genes, which showed that while Geneformer shifts its attention towards key genes, largely it is still subject of a batch-effect, namely the amount of expressed genes. Further research into designing a data representation of consistent size might be beneficial.