Introduction: This study explores the application of the ChemSpaceAL pipeline, an AI-driven tool for molecular generation, in discovering drug candidates for various molecular targets, such as the HNH domain of Cas9, Fibroblast Activation Protein-alpha (FAP-alpha) and Trophoblast
...
Introduction: This study explores the application of the ChemSpaceAL pipeline, an AI-driven tool for molecular generation, in discovering drug candidates for various molecular targets, such as the HNH domain of Cas9, Fibroblast Activation Protein-alpha (FAP-alpha) and Trophoblast cell surface antigen 2 (TROP2). The goal was to evaluate the pipeline’s capacity to generate promising molecules and compare them with known inhibitors, including FDA-approved drugs for c-Abl kinase.
Methods: The ChemSpaceAL pipeline employs deep learning, specifically a Generative Pretrained Transformer (GPT)-based model, to generate novel molecules across multiple iterations. Active learning was used to refine the generated molecules by docking them to specific target proteins and scoring them based on predicted binding affinities. In the case of FAP-alpha, known patented inhibitors were scored to create a benchmark for the AI-generated molecules. The process was iteratively improved by adjusting learning parameters, such as the number of epochs and selection thresholds for active learning.
Results: The pipeline demonstrated the ability to generate molecules with a maximum score of 77 for c-Abl kinase, surpassing the highest score among FDA-approved inhibitors (67.5 for bafetinib), while the average score for the generated molecules was 48.5, compared to 53.1 for the FDA-approved inhibitors. In the case of FAP-alpha, known patented inhibitors scored between 10.5 and 21. AI-generated molecules produced comparable results, with an initial average score of 19.19 and a maximum score of 38.5 in the first iteration. Subsequent iterations saw fluctuations in performance, with iterative improvements stabilizing at an average score of 18.62 and a maximum score of 39 by the third iteration. Adjusting the active learning threshold from the top 10% to 20% of scored complexes yielded more substantial improvements in the molecular generation process.
Conclusion: The results suggest that ChemSpaceAL can explore chemical spaces beyond known inhibitors, occasionally identifying novel molecules with superior predicted binding affinity. However, the study highlights the limitations of relying solely on computational scoring methods, as aspects such as bio-availability and off-target effects are not captured. Future work will focus on reducing the number of active learning epochs to balance model performance and exploring more diverse chemical spaces. The development of a graphical user interface (GUI) and interdisciplinary collaboration for experimental validation will further enhance the pipeline’s accessibility and effectiveness, accelerating the drug discovery process