Grounding Large Language Models in Reaction Knowledge Graphs for Synthesis Retrieval

Master Thesis (2025)
Author(s)

O.A. Bunkova (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M.J.T. Reinders – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jana M. Weber – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

L. Di Fruscia – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Rupprecht – Mentor (TU Delft - ChemE/Process Systems Engineering)

C. Lofi – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
15-09-2025
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Grounding Large Language Models (LLMs) in chemical knowledge graphs (KGs) offers a promising way to support synthesis planning, but reliably retrieving information from these complex structures remains a challenge. Therefore, this work addresses that gap by constructing a bipartite KG and evaluating Text2Cypher query generation across both single- and multi-step retrieval tasks. Different prompting strategies were tested, including zero-shot, one-shot with static, random, or embedding-based example selection, and a checklist-driven self-correction pipeline. Results indicate that one-shot prompting is most effective when the exemplar aligns with the query both structurally and logically. When such an exemplar is provided as context to the Cypher generation prompt, self-correction does not yield significant performance gains. Overall, this study introduces a reproducible setup for Text2Cypher experimentation and evaluation.

Files

License info not available