Metadata Extraction from Scientific Lab Notebooks
Design and Evaluation of a Multi-Agent System
S.J.M. Backer (TU Delft - Electrical Engineering, Mathematics and Computer Science)
C. Lofi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
E.E. Ferradosa – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.J.T. Reinders – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
High-quality, interpretable metadata is essential for enabling data reuse and reproducibility in the life sciences. Frameworks such as the Investigation–Study–Assay (ISA) model provide a structured way to describe experimental workflows, yet in practice metadata remains incomplete, inconsistent, and costly to produce. Lab notebooks, which record experimental procedures, offer a promising source to extract metadata from, but their unstructured nature makes extraction challenging.
This thesis investigates extracting ISA metadata from lab notebooks using LLM-based multi-agent systems. It also addresses the lack of suitable methods for evaluating the performance of such tasks. To this end, this thesis contributes two artefacts: (1) a prototype multi-agent system that extracts ISA metadata from lab notebooks, and (2) a rubric-based evaluation framework which formalises and assesses extraction performance.
A feasibility study is conducted on a set of real-world lab notebooks to evaluate the proposed multi-agent system. The results show that partial extraction is achievable with the current prototype, with 41% of ISA entities represented well, and another 27% sufficiently. Therefore, the prototype can provide a practical starting point for researchers in their ISA metadata creation task. While fully automated extraction remains challenging with the current prototype due to missing and insufficiently represented ISA entities, this work provides recommendations for future system design based on the performed feasibility study. Specifically, to operationalise multi-agent ISA extraction in practical workflows, this thesis recommends building a human-in-the-loop system where researchers and agents collaborate to construct metadata.