Circular Image

R.J. Gardos Reid

info

Please Note

17 records found

This report studies valence constraints for molecule synthesis in a staged program-synthesis framework for chemical reaction network discovery. The baseline system already performs limited molecule validation, but it does so as a final filter after candidate SMILES strings have been generated. The change evaluated here is a valence-constrained grammar for ringless molecules that encodes atom valence directly in the grammar, restricting derivations to locally valence-consistent construction steps.

The resulting grammar reproduces the same ringless molecule sets as the legacy baseline on the audited tests and benchmarks, and all generated molecules pass the repository’s valence-validity checks. Runtime results are mixed: on the small water benchmark, the new grammar becomes faster from depth 5 onward and reaches a 2.87× speedup at depth 10, while on methane and urea it remains slower throughout the measured ringless depth series. A fixed-count methane benchmark shows that this slowdown is not mainly caused by legacy validity checking, but by the added search overhead of the larger valence-aware grammar.

The main conclusion is that the new grammar preserves the audited ringless output behaviour while shifting pruning earlier in the search, but this does not translate into consistent runtime gains. ...
Recovery of full Chemical Reaction Networks from an incomplete (partially observable) problem is an extensive and combinatorial expensive process. This work investigates some of the possible methods to restrict the overall search space of program synthesis and obtain a better rank for the target network, through the use of reaction templates. As such, filter-based network pruning and ranking, as well as reaction size constraint are compared to the existing base synthesizer that this work builds upon, both in terms of search-space and target rank. Additionally, runtime is also considered as a second metric for these experiments. Overall, these methods have obtained some improvements in the small subset of benchmarked networks, but achieving integration of reaction templates into CRN program synthesis still remains an open challenge. ...
Bachelor thesis (2026) - T.M. Bood, S. Dumančić, R.J. Gardos Reid, J.M. Weber
Chemical Reaction Network (CRN) discovery is a time-consuming task that can be automated using program synthesis. However, the search space for realistic CRNs is large, making exhaustive search intractable. This paper investigates whether incorporating bond-breaking energy as a search heuristic can improve the efficiency of grammar-based CRN discovery. This paper proposes two heuristics: a maximum-bond-order heuristic and a more sophisticated delta-energy heuristic, which prioritise reactions with lower net energy change. Both are benchmarked against naive breadth-first search across an entire synthesis pipeline and the three individual stages, for seven Chemical Reaction Networks: water formation, methane combustion, photosynthesis, ethylene glycol formation, methyl acetate hydrolysis, an esterification reaction, and fermentation of glucose. Results show that both the Delta-Energy and Max-Bond heuristics consistently reduce the number of candidates explored and total runtime compared to BFS, with Delta-Energy generally outperforming the Max-Bond heuristic. However, neither heuristic guarantees improvement in all cases. These findings suggest that energy-guided search is a promising direction for scalable CRN discovery, with more accurate bond-energy estimation being a natural avenue for future work. ...

Where Reaction-Database Knowledge is most effective in reducing search

Recovering a chemical reaction network (CRN) from concentration data can be framed as program synthesis, but the search scales poorly: reaching the ground-truth network requires enumerating a very large number of candidates. Hard constraints on the synthesizer's grammar prune candidates by chemical rules such as atom valence and mass balance, yet among the valid candidates a uniform search has no sense of which reactions are plausible. We ask where in a top-down CRN synthesizer knowledge from a reaction database most reduces the candidates explored before the target is found? We compare three integration points across five benchmarks, using the USPTO-50K and Rhea databases: (1) a database-derived building-block vocabulary, (2) a probabilistic context-free grammar (PCFG) over the grammar rules, and (3) an output reranker. The building-block vocabulary is the most positive result: it decides whether the target network is recovered at all, and a shuffle control attributes this to the database's frequency content rather than vocabulary size. Each corpus unlocks only the chemistry it covers, and rank-normalised merging recovers targets that a naive union dilutes. The PCFG adds ordering gain, while the reranker, acting only on the finished list, helps under corpus match and hurts under mismatch. Matching a corpus to the target chemistry, not enlarging it, is what turns a reaction database into useful search guidance.

...
Chemical Reaction Networks (CRNs) are essential for understanding complex reactive processes, yet incomplete experimental data often leave many networks only partially known. Grammar-driven program synthesis offers an approach to completing partial CRNs, but atom-by-atom construction of molecular candidates causes a severe combinatorial explosion, and the baseline synthesiser lacks awareness of the structural context of target molecules. It is not yet known whether substructure-aware heuristics can improve the computational tractability of CRN discovery via program synthesis. To investigate this, BRICS (Breaking of Retrosynthetically Interesting Chemical Substructures) fragments were incorporated into the molecular context-free grammar, and Tanimoto similarity of Morgan2 fingerprints was used to guide reaction and network synthesis. The results show that BRICS fragmentation achieves a 13.8 percentage-point improvement in completing partial reactions on a dataset missing complex organic compounds by encoding large substructures as single grammar rules. Conversely, the enhanced grammar solves 27.4 percentage points fewer problems than the baseline on reactions missing small non-carbon species, as increased branching delays discovery of small molecules. Moreover, molecular similarity guidance does not improve performance in reaction rebalancing from SynRXN datasets, but it substantially reduces the search space in an example esterification CRN synthesis problem, requiring 2,565 fewer candidate reactions and 409 fewer candidate networks before discovering the targets. Thus, BRICS fragmentation and similarity-guided heuristics have distinct strengths. Future frameworks should split the candidate molecule pool between fragment-enhanced and atom-by-atom methods to successfully capture both large structural fragments and small, dissimilar species. ...
This thesis proposes a method that leverages incremental inference to improveinference efficiency in complex probabilistic programs, providing an algorithm-agnostic approach that does not rely on any single sampling method. The methodbuilds on an existing incremental inference framework, which samples from oneprobabilistic program and uses the results to perform inference on a related pro-gram. Our approach uses this framework to sample from simplified programs,thereby bypassing direct inference on complex programs. These simplificationsare constructed by automatically detecting certain program patterns, both struc-tural and parametric, that increase program complexity, and applying correspond-ing changes to simplify them. On a set of input programs, we empirically showwhich patterns increase complexity and demonstrate how the constructed changesachieve more efficient inference than baseline sampling methods. ...
This thesis explores the automated construction of Chemical Reaction Networks (CRNs) from incomplete experimental data, a task traditionally dependent on expert knowledge and manual effort. CRNs model the interactions between chemical species through a network of reactions and are essential in fields such as medicine and chemistry. However, many real-world systems include unobserved or unmeasurable species, making CRN construction challenging. To address this, this thesis frames CRN discovery as a program synthesis problem, using grammars and constraints to define the space of possible CRNs. A modular synthesis pipeline is developed that incrementally builds candidate molecules, reactions, and networks given a problem definition. Experimental results demonstrate that constraints effectively reduce the search space and that the solver is capable of identifying the correct reaction networks. Moreover, a scoring mechanism ranks the expected CRN highly among generated candidates. ...

Creating robust plans for the Minecraft planner of the PDDL Gym library using Probablistisitic Inference

All over the world, people plan their daily activities. These plans include a lot of different tasks and can vary widely in kinds of activities. These plans must account for uncertainties and unknowns in the world. Planning around these uncertainties is difficult and hard to accomplish with traditional means of programming. For this set of problems, probabilistic programming is proposed. Given the Minecraft planner from the Planning Domain Definition Language (PDDL) gym library, is it possible to create Robust plans that incorporate inference without changing the underlying planner? "PDDL is a human-readable format for problems in automated planning that gives a description of the possible states of the world, a description of the set of possible actions, a specific initial state of the world, and a specific set of desired goals." [6] The current approach is using heuristics to find the optimal plan for the problem. In this research paper, an alternative method is proposed; using probabilistic programming and the existing planner to create a simulated world of Minecraft. This model introduces inference without changing the already existing planner. ...
Planning problems are a set of problems in which an objective must be reached by a sequence of actions. Planning problems traditionally do not consider uncertainty, however for most real-world planning problems uncertainty must be considered to create effective plans. The objective of this paper is to use an existing deterministic planning algorithm in order to create robust plans that solve an uncertain version of Sokoban called Uncertain Move Sokoban. To this end a probabilistic programming language, Gen.jl, is used, which enables creating probabilistic models and inferring its parameters using code. A probabilistic model is created in Gen.jl, that generates plans for the problem as well as robustness scores using a simulator embedded in the model. Probabilistic inference techniques are then used to obtain a robust plan for the uncertain problem, namely: importance sampling and Metropolis-Hastings. We find that the technique is able to create robust plans for small to medium-sized problems and that Metropolis-Hastings is the better-performing inference technique. ...

Creating Robust Plans using Replanning

Planning is very important in everyday life, whether it would be creating schedules for planes or plans for manufacturing. These domains contain uncertainties requiring plans that are robust. However, there is a need for an approach which creates robust plans regardless of the domain and without changing its planning agent. Here, a replanning approach is proposed akin to the Metropolis-Hastings algorithm and its performance is compared to the performance of importance sampling. Replanning works by iteratively trying to improve the previously generated plan. The performance is compared by means of the Keys and Doors problem. It is found that replanning performed better than importance sampling in the two analysed problems. Furthermore, changing the parameter, σ, used in the replanning approach showed a significant difference in its corresponding performance. While the replanning approach has only been tested on the Keys and Doors problem, the results show that replanning is a promising approach which could work irrespective of the domain at hand. ...
Train Unit Shunting is a complex process that directs trains through a shunting yard. In real-world railway operations, disturbances are common, requiring shunting schedules to be robust against uncertainties such as delays. Previous research has proposed algorithms for the Train Unit Shunting Problem (TUSP) and one study attempted to create robust shunting plans by defining a probabilistic model of the uncertainties involved and inferring a distribution of robust solutions for the TUSP. Following this approach, this paper investigates the use of probabilistic programming in increasing robustness of shunting plans using an advanced TUSP solver. This research develops a model for uncertain shunting scenarios, solves these scenarios, and applies importance sampling to infer the posterior distribution, producing a distribution of robust shunting plans instead of a single plan. The paper presents examples demonstrating that it is beneficial to use one of the output robust plans over the plan made for the deterministic scenario, revealing the potential of integrating probabilistic programming techniques into the planning process to improve railway efficiency and reduce delays. ...
Scheduling problems are present in many real-world situations, such as construction projects, manufacturing processes, or train timetabling. One common formalization is the Resource Constrained Project Scheduling Problem (RCPSP), where the goal is to find an optimal schedule given limited resources. Traditional algorithms optimize for project duration under deterministic assumptions, which could lead to poor performance under uncertainty. This thesis explores how to create robust schedules, meaning they can withstand uncertainty, for stochastic task durations. Robust schedules minimize delays, measured as the difference between a task's completion time and its deadline. The method proposed uses probabilistic programming as a tool to accomplish this. A robustness distribution over schedules is inferred using importance sampling, and a robust schedule can be selected from that explored distribution. Importantly, this approach presented in this thesis can be applied on top of existing scheduling and simulation algorithms without requiring any knowledge or changes to themselves. Created schedules can also be abstracted, thus do not need to be analyzed or seen. This makes the proposed method general and easy to adopt in practice. ...
Bachelor thesis (2024) - F.M.T. Molnár, S. Dumančić, R.J. Gardos Reid
Program synthesis aims to automatically generate programs that fulfil user-specified constraints. The field has developed many different techniques to enable program synthesis in various application domains. This work will focus on the enumerative approach, which iteratively explores solutions in the program space from the smallest programs to larger ones. This technique works well on small to medium-sized problems. With the exponentially growing program space performance of the enumerative solver rapidly declines.

This work aims to generalize the divide and conquer approach to combine partial solutions found by an enumerative solver. Firstly by forming the smallest subset that satisfies every example in a problem and then using decision trees to find the final program that satisfies all input-output examples. Our prototype named SubsetDT will be incorporated into the Julia library Herb.jl which provides a flexible environment for testing and developing program synthesis ideas.

This methodology was evaluated on SyGuS competition benchmarks from two different tracks. Results indicate that the use of a decision tree on top of enumeration search improves solver performance by 33% on the string manipulation track and significantly on the bit-vector manipulation track, though both tracks showed some overfitting. The experiments demonstrate that the SubsetDT algorithm can substantially improve enumeration solvers, its effectiveness highly depends on the iterators used. ...
Bachelor thesis (2024) - D. Heijmans, S. Dumančić, R.J. Gardos Reid
Program synthesis is the task of generating a program that suffices the intent of a user based on a set of input-output examples. Searching over the set of all possible programs becomes intractable very quickly. Therefore, divide and conquer techniques have become popular within the field, but have mainly been applied on the set of examples. However, this paper focuses on applying the strategy on the problem’s context-free grammar by splitting it into subgrammars.
Our new technique first splits the grammar by making a dependency graph showing how all rules relate the different types of symbols. Afterwards, there is an exploration and exploitation phase where different sets of subgrammars will be given a score and get allocated an amount of enumerations to generate programs based on that score.
The new technique is implemented as an iterator in Herb.jl which is a program synthesis framework. The iterator is then benchmarked against a plain BFS iterator using 100 string-manipulation problems. The grammar splitting strategy needs on average more enumerations to find a program solving all examples compared to the BFS iterator. However, running the different grammars from the iterator in parallel could allow the iterator to find a solution from one of the grammars earlier. ...

Combining Programs Learned on Subsets of Examples

Program synthesis tackles the challenge of generating programs from user specifications, a task proven undecidable due to the exponential search space growth. In program synthesis the Divide and Conquer technique can be employed to prune this search. By decomposing specifications into individual examples, multiple programs are synthesized to solve them separately. Later these small programs are combined into a bigger program which should satisfy all input-output pairs by itself. There have been previous successful attempts at using Divide and Conquer, including combining programs using decision trees. However, a gap persists in the program synthesis community regarding comprehensive frameworks for integrating diverse implementation strategies. This project addresses this gap by incorporating a Divide and Conquer algorithm into a program synthesis library, enabling users to explore different ways of generating the small programs while abstracting the unification procedure. Moreover, we also discuss a "greedy" strategy to generate the programs that will be combined. We found that a Divide and Conquer manages to improve naive search, especially when given more than 10 examples. ...

How can such approach be implemented in a synthesis system that cannot benefit from in-advance refinement of the synthesis algorithm parameters

Program synthesis is an important problem in computer science. One method often employed is enumerative program synthesis, which produces a sequence of programs in the target language until one solves the required input-output examples. This can yield undesirable runtimes for some problems that require complex programs to solve them. This research is about how simpler problem solutions can be used to build a library of useful helper functions and to use it for further synthesis in order to reduce the depth of the enumerative search for more complex problems. Drawing inspiration from the work by Ellis, et al on DreamCoder, this research describes ways of obtaining simpler programs, and extending the grammar using parts of the solutions. Experiments have been performed on the proposed synthesis algorithm, which is implemented using the Herb.jl framework, and results for each part of the synthesis algorithm being replaced by different strategies are provided. Allowing holes in the new grammar substitutions and using smaller size or frequency as their utility has proven superior. There needs to be more work done about the conciseness of the produced programs. ...

Enhancing Domain-Specific Language-Based Synthesis by Identifying and Utilizing Common Patterns

Program synthesis is the process of constructing programs that provably satisfy a given high-level user specification. Recent work in this domain has focused on utilizing domain-specific languages to guide the search procedure. This study proposes a novel approach to enhance the efficiency of such search procedures. By utilizing anti-unification, which is the process of generating the least general pattern between two symbolic expressions, this work aims to find common sub-components to enhance the language used in program synthesis to reduce search depth and improve performance. ...