Evaluating Multi-Modal Drug Embeddings Across Diverse Oncology Prediction Tasks

Master Thesis (2026)
Author(s)

M. Mahmoudi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.T. Reinders – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Brouwer – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M. Khosla – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
29-06-2026
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
16
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Predictive computational oncology models are fundamentally limited by their uni-modal input drug representations. To overcome this bottleneck, we developed DrugZip, a uniform, task-agnostic, 128-dimensional representation that compresses 25 diverse modalities from the Chemical Checker across a context of 1.2 million molecules. By using a modified autoencoder, DrugZip successfully stabilises the latent space and avoids posterior collapse from a standard variational autoencoder. We evaluated DrugZip across three downstream tasks. In drug synergy prediction, it achieved an AUC of 0.844, resisting performance collapse in unseen cell environments with a mean AUC of 0.62. In drug sensitivity prediction, DrugZip bypassed the extreme overfitting of high-dimensional baselines on unseen drugs. Finally, in cellular perturbation modelling via ChemCPA, DrugZip demonstrated representational sufficiency by matching state-of-the-art transcriptomic prediction accuracy ($R^2$ of 0.776 vs 0.792). Geometrical and information-content analyses confirm that DrugZip produces a continuous, balanced embedding space where drugs remain individually distinguishable. Ultimately, DrugZip shifts the paradigm from engineering task-specific features toward utilising a robust, generalizable, multi-modal representation for computational oncology.

Files

License info not available