InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments

Journal Article (2025)
Authors

Kevin Eloff (InstaDeep Ltd)

Konstantinos Kalogeropoulos (Technical University of Denmark (DTU))

Amandla Mabona (InstaDeep Ltd)

Oliver Morell (Technical University of Denmark (DTU))

Rachel Catzel (InstaDeep Ltd)

Esperanza Rivera-de-Torre (Technical University of Denmark (DTU))

Sam P.B. van Beljouw (TU Delft - BN/Stan Brouns Lab, Kavli institute of nanoscience Delft)

Stan J J Brouns (Kavli institute of nanoscience Delft, TU Delft - BN/Stan Brouns Lab)

Timothy P. Jenkins (Technical University of Denmark (DTU))

G.B. More Authors

Research Group
BN/Stan Brouns Lab
To reference this document use:
https://doi.org/10.1038/s42256-025-01019-5
More Info
expand_more
Publication Year
2025
Language
English
Research Group
BN/Stan Brouns Lab
Volume number
7
Pages (from-to)
565-579
DOI:
https://doi.org/10.1038/s42256-025-01019-5
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Mass spectrometry-based proteomics focuses on identifying the peptide that generates a tandem mass spectrum. Traditional methods rely on protein databases but are often limited or inapplicable in certain contexts. De novo peptide sequencing, which assigns peptide sequences to spectra without prior information, is valuable for diverse biological applications; however, owing to a lack of accuracy, it remains challenging to apply. Here we introduce InstaNovo, a transformer model that translates fragment ion peaks into peptide sequences. We demonstrate that InstaNovo outperforms state-of-the-art methods and showcase its utility in several applications. We also introduce InstaNovo+, a diffusion model that improves performance through iterative refinement of predicted sequences. Using these models, we achieve improved therapeutic sequencing coverage, discover novel peptides and detect unreported organisms in diverse datasets, thereby expanding the scope and detection rate of proteomics searches. Our models unlock opportunities across domains such as direct protein sequencing, immunopeptidomics and exploration of the dark proteome.