Evaluating Machine Learning Approaches for Predicting Drug Response in Cancer Cells

A Comparative Analysis of Geneformer and Support Vector Machine

More Info
expand_more

Abstract

Accurately predicting how cancer cells respond to drug treatment is important to advance drug development. This paper presents a comparative analysis of Geneformer, a deep-learning transformer pre-trained on transcriptomic data, and Support Vector Machine. Using the Sciplex2 dataset, which includes transcriptomic data from lung cancer cells treated with three drugs, both models were trained to predict the response of cancer cells to drug treatments.

This paper investigates how Geneformer and SVM perform in predicting the treatment label of cells across different drugs and doses, which drug doses are suitable for conducting single-gene perturbation experiments, how accurately can these experiments replicate drug effects, and what are the differences in results between Geneformer and SVM regarding their ability to identify significant genes affecting drug response.

Results indicate that while SVM generally achieves higher accuracy in predicting treatment labels of cells, Geneformer demonstrates better capability in identifying genes whose perturbations mimic drug effects. Geneformer's embeddings show significant shifts towards treated cell states after single-gene perturbations, indicating a deeper understanding of gene interactions in drug response. On the other hand, SVM's predictions rely more on differential gene expression. This comparative analysis underscores the strengths and limitations of each approach in modelling complex biological systems and predicting the drug response of cancer cells.