From Post-Validation to Grammar-Level Pruning: Valence Constraints for Molecule Synthesis
M.A. van Veen (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Dumančić – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
R.J. Gardos Reid – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.M. Weber – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This report studies valence constraints for molecule synthesis in a staged program-synthesis framework for chemical reaction network discovery. The baseline system already performs limited molecule validation, but it does so as a final filter after candidate SMILES strings have been generated. The change evaluated here is a valence-constrained grammar for ringless molecules that encodes atom valence directly in the grammar, restricting derivations to locally valence-consistent construction steps.
The resulting grammar reproduces the same ringless molecule sets as the legacy baseline on the audited tests and benchmarks, and all generated molecules pass the repository’s valence-validity checks. Runtime results are mixed: on the small water benchmark, the new grammar becomes faster from depth 5 onward and reaches a 2.87× speedup at depth 10, while on methane and urea it remains slower throughout the measured ringless depth series. A fixed-count methane benchmark shows that this slowdown is not mainly caused by legacy validity checking, but by the added search overhead of the larger valence-aware grammar.
The main conclusion is that the new grammar preserves the audited ringless output behaviour while shifting pruning earlier in the search, but this does not translate into consistent runtime gains.