Improving Chemical Reaction Completion using Atom-Balance Constraints in Transformer Models

None, None

Improving Chemical Reaction Completion using Atom-Balance Constraints in Transformer Models

Master Thesis (2025)

Author(s)

M.T.W. Noordsij (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.M. Weber – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

M.J.T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

G. Vogel – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J. Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:42971194-dce2-44c7-ac25-7f164bbfa1ef

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Online databases contain extensive collections of (bio)chemical reactions serving as valuable resources for a variety of applications. However, these large datasets often suffer from incomplete reaction data missing, for example, co-reactants and by-products. Machine learning can help to predict these missing molecules in partial reactions. In this study, we adapt an existing transformer model to enhance its capability in completing these incomplete reactions. We retrain the model using a more diverse dataset of atom-balanced ground truth reactions and introduce both soft and hard atom-balance constraints to improve the completeness and chemical validity of the predictions. Our findings indicate that models trained with soft constraints in their loss function do not demonstrate improved balancing performance and require further tuning. Conversely, the implementation of hard atom-balance constraints during constrained beam search, where we restrict predicting tokens that violate the atom-balance of the prediction, effectively improves the performance of transformer-based models in reaction completion tasks. However, this approach also presents the risk of inaccurately balancing reactions; a limitation that is difficult to identify without chemical expertise, underscoring the necessity for reliable ground truth data to evaluate the predictions.

Files

MSc_Thesis_Minouk_Noordsij.pdf

(pdf | 7.09 Mb)

License info not available