Geneformer is a transformer which is pretrained on Geneformer-30M, a dataset consisting of 29.9 million healthy cells. This paper focuses on how Geneformer shifts its attention, when fine-tuned on a dataset of cancer cells, whose gene expression is expected to be distinct, and wh
...
Geneformer is a transformer which is pretrained on Geneformer-30M, a dataset consisting of 29.9 million healthy cells. This paper focuses on how Geneformer shifts its attention, when fine-tuned on a dataset of cancer cells, whose gene expression is expected to be distinct, and which genes are key when making cell-state predictions within such an environment. In this paper we compare the shift in attention, and which genes receive the most attention, in a weight-based analysis.
The observed shift in attention was significant, however the accuracy of the prediction increased minimally. The attention weights were mapped back to individual genes, which showed that while Geneformer shifts its attention towards key genes, largely it is still subject of a batch-effect, namely the amount of expressed genes. Further research into designing a data representation of consistent size might be beneficial.