baseLess

lightweight detection of sequences in raw MinION data

Journal Article (2023)
Author(s)

Ben Noordijk (Wageningen University & Research)

Reindert Nijland (Wageningen University & Research)

Victor J. Carrion (Universiteit Leiden, Universidad de Málaga, Netherlands Institute of Ecology)

Jos M. Raaijmakers (Netherlands Institute of Ecology, Universiteit Leiden)

Dick De Ridder (Wageningen University & Research)

Carlos De Lannoy (Wageningen University & Research, TU Delft - BN/Chirlmin Joo Lab)

Research Group
BN/Chirlmin Joo Lab
DOI related publication
https://doi.org/10.1093/bioadv/vbad017
More Info
expand_more
Publication Year
2023
Language
English
Research Group
BN/Chirlmin Joo Lab
Journal title
Bioinformatics Advances
Issue number
1
Volume number
3
Article number
vbad017
Downloads counter
257
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

With its candybar form factor and low initial investment cost, the MinION brought affordable portable nucleic acid analysis within reach. However, translating the electrical signal it outputs into a sequence of bases still requires mid-tier computer hardware, which remains a caveat when aiming for deployment of many devices at once or usage in remote areas. For applications focusing on detection of a target sequence, such as infectious disease monitoring or species identification, the computational cost of analysis may be reduced by directly detecting the target sequence in the electrical signal instead. Here, we present baseLess, a computational tool that enables such target-detection-only analysis. BaseLess makes use of an array of small neural networks, each of which efficiently detects a fixed-size subsequence of the target sequence directly from the electrical signal. We show that baseLess can accurately determine the identity of reads between three closely related fish species and can classify sequences in mixtures of 20 bacterial species, on an inexpensive single-board computer.