Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval
O.A. Bunkova (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.J.T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Jana M. Weber – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
L. Di Fruscia – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
S. Rupprecht – Mentor (TU Delft - ChemE/Process Systems Engineering)
C. Lofi – Graduation committee member (TU Delft - Web Information Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Grounding Large Language Models (LLMs) in chemical knowledge graphs (KGs) offers a promising way to support synthesis planning, but reliably retrieving information from these complex structures remains a challenge. Therefore, this work addresses that gap by constructing a bipartite KG and evaluating Text2Cypher query generation across both single- and multi-step retrieval tasks. Different prompting strategies were tested, including zero-shot, one-shot with static, random, or embedding-based example selection, and a checklist-driven self-correction pipeline. Results indicate that one-shot prompting is most effective when the exemplar aligns with the query both structurally and logically. When such an exemplar is provided as context to the Cypher generation prompt, self-correction does not yield significant performance gains. Overall, this study introduces a reproducible setup for Text2Cypher experimentation and evaluation.