Evaluating Multi-Modal Drug Embeddings Across Diverse Oncology Prediction Tasks
M. Mahmoudi (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.J.T. Reinders – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Brouwer – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. Khosla – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Predictive computational oncology models are fundamentally limited by their uni-modal input drug representations. To overcome this bottleneck, we developed DrugZip, a uniform, task-agnostic, 128-dimensional representation that compresses 25 diverse modalities from the Chemical Checker across a context of 1.2 million molecules. By using a modified autoencoder, DrugZip successfully stabilises the latent space and avoids posterior collapse from a standard variational autoencoder. We evaluated DrugZip across three downstream tasks. In drug synergy prediction, it achieved an AUC of 0.844, resisting performance collapse in unseen cell environments with a mean AUC of 0.62. In drug sensitivity prediction, DrugZip bypassed the extreme overfitting of high-dimensional baselines on unseen drugs. Finally, in cellular perturbation modelling via ChemCPA, DrugZip demonstrated representational sufficiency by matching state-of-the-art transcriptomic prediction accuracy ($R^2$ of 0.776 vs 0.792). Geometrical and information-content analyses confirm that DrugZip produces a continuous, balanced embedding space where drugs remain individually distinguishable. Ultimately, DrugZip shifts the paradigm from engineering task-specific features toward utilising a robust, generalizable, multi-modal representation for computational oncology.