Leveraging large language models for enzymatic reaction prediction and characterization

None, None; None, None

Leveraging large language models for enzymatic reaction prediction and characterization

Journal Article (2025)

Author(s)

Lorenzo Di Fruscia (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Jana M. Weber (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Pattern Recognition and Bioinformatics

DOI related publication

https://doi.org/10.1039/d5dd00187k Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:0db200d6-d98d-441f-a8b2-d29b38743dae

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Pattern Recognition and Bioinformatics

Journal title

Digital Discovery

Issue number

12

Volume number

4

Pages (from-to)

3588-3609

Downloads counter

49

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Predicting enzymatic reactions is crucial for applications in biocatalysis, metabolic engineering, and drug discovery, yet it remains a complex and resource-intensive task. Large Language Models (LLMs) have recently demonstrated remarkable success in various scientific domains, e.g., through their ability to generalize knowledge, reason over complex structures, and leverage in-context learning strategies. In this study, we systematically evaluate the capability of LLMs, particularly the Llama-3.1 family (8B and 70B), across three core biochemical tasks: enzyme commission number prediction, forward synthesis, and retrosynthesis. We compare single-task and multitask learning strategies, employing parameter-efficient fine-tuning via LoRA adapters. Additionally, we assess performance across different data regimes to explore their adaptability in low-data settings. Our results demonstrate that fine-tuned LLMs capture biochemical knowledge, with multitask learning enhancing forward- and retrosynthesis predictions by leveraging shared enzymatic information. We also identify key limitations, for example challenges in hierarchical EC classification schemes, highlighting areas for further improvement in LLM-driven biochemical modeling.

Files

D5dd00187k.pdf

(pdf | 3.2 Mb)