From Post-Validation to Grammar-Level Pruning: Valence Constraints for Molecule Synthesis

Bachelor Thesis (2026)
Author(s)

M.A. van Veen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Dumančić – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R.J. Gardos Reid – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.M. Weber – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
23-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project, Herb, Chemistry through the computer scientist's lens
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
3
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This report studies valence constraints for molecule synthesis in a staged program-synthesis framework for chemical reaction network discovery. The baseline system already performs limited molecule validation, but it does so as a final filter after candidate SMILES strings have been generated. The change evaluated here is a valence-constrained grammar for ringless molecules that encodes atom valence directly in the grammar, restricting derivations to locally valence-consistent construction steps.

The resulting grammar reproduces the same ringless molecule sets as the legacy baseline on the audited tests and benchmarks, and all generated molecules pass the repository’s valence-validity checks. Runtime results are mixed: on the small water benchmark, the new grammar becomes faster from depth 5 onward and reaches a 2.87× speedup at depth 10, while on methane and urea it remains slower throughout the measured ringless depth series. A fixed-count methane benchmark shows that this slowdown is not mainly caused by legacy validity checking, but by the added search overhead of the larger valence-aware grammar.

The main conclusion is that the new grammar preserves the audited ringless output behaviour while shifting pruning earlier in the search, but this does not translate into consistent runtime gains.

Files

License info not available