Print Email Facebook Twitter Chemical reaction completion: a hybrid rule-based and language model-based approach Title Chemical reaction completion: a hybrid rule-based and language model-based approach Author van Wijngaarden, Matthijs (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Weber, J.M. (mentor) Reinders, M.J.T. (graduation committee) Vogel, G. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science Date 2023-11-13 Abstract Large chemical reaction databases often suffer from incompleteness, such as missing molecules or stoichiometric information. Concurrently, numerous computational models are being developed in predictive chemistry that rely on reaction databases and would hugely benefit from complete reaction equations. Also, research in sustainable chemistry often focuses on automated mass balance tasks, which require a full reaction to properly evaluate. In this work, we present a hybrid approach for computational completion of reaction equations. Specifically, we combine a rule-based method and a machine learning (ML) model to complete reactions. The rule-based approach constructs a balance of atoms and charge on either side of the reaction in an attempt to find missing molecules. We tailor the pre-trained transformer model on the chemical language domain to take partial reactions as inputs and predict missing molecules. Furthermore, we present a novel approach to measure the correctness of our model, which is useful when we apply it to the uncurated dataset and the ground-truth is unknown. Subject Language ModelsCheminformaticsChemical Reaction To reference this document use: http://resolver.tudelft.nl/uuid:fb806f47-1c5a-46d3-a585-b0b95eb626bc Part of collection Student theses Document type master thesis Rights © 2023 Matthijs van Wijngaarden Files PDF MScThesis_MatthijsVanWijn ... aarden.pdf 1.08 MB Close viewer /islandora/object/uuid:fb806f47-1c5a-46d3-a585-b0b95eb626bc/datastream/OBJ/view