Thomas Abeel | TU Delft Repository

Predicted meta-omics

A potential solution to multi-omics data scarcity in microbiome studies

Journal article (2026) - Bianca Maria Cosma, Stephanie Pillay, David Calderón-Franco, Thomas Abeel

Imbalances in the gut microbiome have been linked to conditions such as inflammatory bowel disease, diabetes, and cancer. While metagenomics and amplicon sequencing are commonly used to study the microbiome, they do not capture all layers of microbial functions. Other meta-omics data can provide more insights, but these are more costly and laborious to procure. The growing availability of paired meta-omics data offers an opportunity to develop machine learning models that can infer connections between metagenomics data and other forms of meta-omics data, enabling the prediction of these other forms of meta-omics data from metagenomics. We evaluated several machine learning models for predicting meta-omics features from various meta-omics inputs. Simpler architectures such as elastic net regression and random forests generated reliable predictions of transcript and metabolite abundances, with correlations of up to 0.77 and 0.74, respectively, but predicting protein profiles was more challenging. We also identified a core set of well-predicted features for each meta-omics output type, and showed that multi-output regression neural networks performed similarly when trained using fewer output features. Lastly, our experiments demonstrated that predicted features can be used for the downstream task of inflammatory bowel disease classification, with performance comparable to that of experimental data. ...

AILMENT

A novel ML framework for prediction and analysis of microbiota associations in colorectal cancer

Journal article (2026) - N. Strepis, Z. Lu, W. de Koning, B. J.M. Rijvers, A. A. de Souza, C. Verhoef, B. Fosso, M. Doukas, T. Abeel, More Authors

Objective Colorectal cancer (CRC) is one of the most common cancers in the world, with research suggesting a potential association with the human microbiota. However, simply comparing relative microbial abundances could overlook connections between microbes and specific clinical characteristics of CRC. Methods Here, we present the machine learning (ML) framework ‘AILMENT’ (AI-linked Microbiota Exploration of Nascent Tumours) that efficiently associates microbiota profiles with CRC metadata. The Random Forest and Extreme Gradient Boosting machine learning methods incorporated in AILMENT were used to identify associations between the microbiota and CRC phenotypes relating to clinical outcomes. Results Sixteen ML models were generated from public data of 778 individuals using AILMENT, indicating associations between the microbiota and several different clinical characteristics of CRC, including microsatellite instability (MSI) and BRAF mutations (median AUROC and F1 scores of the ML models reached up to 0.90 and 0.85, respectively). Additionally, associations between Odoribacter, Leptotrichia, Granulicella, Parvimonas, Fusobacterium and other genera with CRC were observed. With respect to sample type, distinct microbial compositions were observed between tissue and faecal samples, indicating fundamental differences in microbiota composition between these sample types. The AILMENT framework pinpointed an association between pathogens such as Porphyromonas and Parvimonas and CRC, confirming their role as microbial signatures in the disease. Moreover, the framework could indicate microbes linked to a healthy gut distinct from the CRC state, such as the butyrate-producers Lactobacillus, Eubacterium and Ruminococcus. To validate the performance and utility of AILMENT, we applied it to a publicly available dataset of bacterial species abundance and associated metadata, successfully replicating the key findings. Conclusion The AILMENT framework can efficiently predict associations between different clinical characteristics of CRC and complex microbial relative abundance data. AILMENT enables the identification of specific microbes at the genus level for detailed clinical characterisation of CRC, demonstrating its potential as a tool for a better understanding of cancer-microbiota interactions. ...

Objective Colorectal cancer (CRC) is one of the most common cancers in the world, with research suggesting a potential association with the human microbiota. However, simply comparing relative microbial abundances could overlook connections between microbes and specific clinical characteristics of CRC. Methods Here, we present the machine learning (ML) framework ‘AILMENT’ (AI-linked Microbiota Exploration of Nascent Tumours) that efficiently associates microbiota profiles with CRC metadata. The Random Forest and Extreme Gradient Boosting machine learning methods incorporated in AILMENT were used to identify associations between the microbiota and CRC phenotypes relating to clinical outcomes. Results Sixteen ML models were generated from public data of 778 individuals using AILMENT, indicating associations between the microbiota and several different clinical characteristics of CRC, including microsatellite instability (MSI) and BRAF mutations (median AUROC and F1 scores of the ML models reached up to 0.90 and 0.85, respectively). Additionally, associations between Odoribacter, Leptotrichia, Granulicella, Parvimonas, Fusobacterium and other genera with CRC were observed. With respect to sample type, distinct microbial compositions were observed between tissue and faecal samples, indicating fundamental differences in microbiota composition between these sample types. The AILMENT framework pinpointed an association between pathogens such as Porphyromonas and Parvimonas and CRC, confirming their role as microbial signatures in the disease. Moreover, the framework could indicate microbes linked to a healthy gut distinct from the CRC state, such as the butyrate-producers Lactobacillus, Eubacterium and Ruminococcus. To validate the performance and utility of AILMENT, we applied it to a publicly available dataset of bacterial species abundance and associated metadata, successfully replicating the key findings. Conclusion The AILMENT framework can efficiently predict associations between different clinical characteristics of CRC and complex microbial relative abundance data. AILMENT enables the identification of specific microbes at the genus level for detailed clinical characterisation of CRC, demonstrating its potential as a tool for a better understanding of cancer-microbiota interactions.

Phylogenetic analysis reveals diversity in glycan biosynthesis in “Candidatus Accumulibacter”

Journal article (2026) - Simon A. Eerden, Thomas Abeel, Mark C.M. van Loosdrecht, Samarpita Roy

Although biofilms are widespread in nature, the ecological roles and compositional diversity of the extracellular polymeric substances (EPS) forming these structures remain poorly understood. Here, we apply a bottom-up genomic approach by investigating the biosynthetic potential for glycan precursors in the genus “Candidatus Accumulibacter”, with a focus on assessing the intra-genus variability. Within a curated set of 61 “Ca. Accumulibacter” MAGs, our analysis revealed a dichotomy in glycan precursors between a conserved core group of 9 nucleotide-sugars and a variable accessory set of 12 nucleotide-sugars, out of 50 nucleotide-sugars tested. The core nucleotide-sugars in “Ca. Accumulibacter” are related to nucleotide-sugars also found to be widely distributed across the tree of life, whereas the accessory set is enriched in rare nucleotide-sugars. The accessory nucleotide-sugars show an irregular distribution across “Ca. Accumulibacter” phylogeny, and divergent evolutionary histories. This highlights the possibility that distinct evolutionary pressures act on different parts of the EPS-formation metabolism, leading to genotypic diversification driven by complex biological phenomena such as horizontal gene transfer that support the observed divergent evolutionary histories. ...

Antibiotic growth promoter and phytogenic feed additive consistently alter microbial community structure in chicken cecum

Journal article (2026) - C. Peng, G. delle Grazie, M. Ghanbari, A. May, T. Abeel

BackgroundEfforts to replace antibiotic growth promoters (AGPs) in livestock are often hindered by a limited mechanistic understanding of how sub-therapeutic antibiotic doses enhance animal growth. Since AGP concentrations are typically too low to directly suppress pathogens, their effects on the gut microbiome, particularly its ecological dynamics, warrant closer investigation. A critical but underexplored dimension is how these additives influence the structure and stability of microbial communities as interconnected ecosystems.MethodsWe conducted a comparative network-based analysis to examine the effects of zinc-bactracin, a commonly used AGP, and Digestarom®, an alternative phytogenic feed additive (PFA) on cecal microbiome dynamics in broiler chickens. Using metagenomic data from a repeated cross-sectional randomized controlled trial of 96 broiler chickens assigned to three dietary groups: Basal (Control), AGP and PFA, we constructed microbial co-occurrence networks using Spearman's correlation for birds raised on basal, AGP-, or PFA-supplemented diets at key developmental stages (Day 3, 14, 21, and 35). We assessed changes in network topology, modular organization and node centrality. We evaluated whether the network-prioritized keystone taxa could discriminate among diets using a Random Forest classifier.ResultsCompared to the Control group, both AGP and PFA treatments induced consistent shifts in network topology, including reduced connectivity, increased modularity, increased percentage of positive interactions, enhanced mucosa connectivity, and improved structural robustness over experiment time. Overall, these treatment-induced changes were more pronounced under AGP than under PFA. Despite these changes, we identified conserved subgraphs with stable interconnections across diets and time points during the experiment. The node centrality analysis revealed condition-specific keystone taxa, but Linear Discriminant Analysis (LDA) and Random Forest (RF) struggled to accurately differentiate between diets using their abundance, particularly between PFA and the two other groups.ConclusionOur findings reveal that feed additives can reshape gut microbial dynamics without producing marked compositional shifts. The consistent network-level changes observed for both AGP and PFA highlight the value of ecological network analysis in uncovering microbial community responses. These insights improve our understanding of cecal microbiome responses in chickens, highlight potential modes of action of AGPs, and offer a comparative framework for assessing the microbial impacts of alternative feed additives. ...

BackgroundEfforts to replace antibiotic growth promoters (AGPs) in livestock are often hindered by a limited mechanistic understanding of how sub-therapeutic antibiotic doses enhance animal growth. Since AGP concentrations are typically too low to directly suppress pathogens, their effects on the gut microbiome, particularly its ecological dynamics, warrant closer investigation. A critical but underexplored dimension is how these additives influence the structure and stability of microbial communities as interconnected ecosystems.MethodsWe conducted a comparative network-based analysis to examine the effects of zinc-bactracin, a commonly used AGP, and Digestarom®, an alternative phytogenic feed additive (PFA) on cecal microbiome dynamics in broiler chickens. Using metagenomic data from a repeated cross-sectional randomized controlled trial of 96 broiler chickens assigned to three dietary groups: Basal (Control), AGP and PFA, we constructed microbial co-occurrence networks using Spearman's correlation for birds raised on basal, AGP-, or PFA-supplemented diets at key developmental stages (Day 3, 14, 21, and 35). We assessed changes in network topology, modular organization and node centrality. We evaluated whether the network-prioritized keystone taxa could discriminate among diets using a Random Forest classifier.ResultsCompared to the Control group, both AGP and PFA treatments induced consistent shifts in network topology, including reduced connectivity, increased modularity, increased percentage of positive interactions, enhanced mucosa connectivity, and improved structural robustness over experiment time. Overall, these treatment-induced changes were more pronounced under AGP than under PFA. Despite these changes, we identified conserved subgraphs with stable interconnections across diets and time points during the experiment. The node centrality analysis revealed condition-specific keystone taxa, but Linear Discriminant Analysis (LDA) and Random Forest (RF) struggled to accurately differentiate between diets using their abundance, particularly between PFA and the two other groups.ConclusionOur findings reveal that feed additives can reshape gut microbial dynamics without producing marked compositional shifts. The consistent network-level changes observed for both AGP and PFA highlight the value of ecological network analysis in uncovering microbial community responses. These insights improve our understanding of cecal microbiome responses in chickens, highlight potential modes of action of AGPs, and offer a comparative framework for assessing the microbial impacts of alternative feed additives.

Long-term multi-meta-omics resolves the ecophysiological controls of seasonal N2O emissions during wastewater treatment

Journal article (2025) - Nina Roothans, Martin Pabst, Menno van Diemen, Claudia Herrera Mexicano, Marcel Zandvoort, Thomas Abeel, Mark C.M. van Loosdrecht, Michele Laureni

Nitrous oxide (N2O) is the third most important greenhouse gas and originates primarily from natural and engineered microbiomes. Effective emission mitigations are currently hindered by the largely unresolved ecophysiological controls of coexisting N2O-converting metabolisms in complex communities. To address this, we used biological wastewater treatment as a model ecosystem and combined long-term metagenome-resolved metaproteomics with ex situ kinetic and full-scale operational characterization over nearly 2 years. By leveraging the evidence independently obtained at multiple ecophysiological levels, from individual genetic potential to actual metabolism and emergent community phenotype, the cascade of environmental and operational triggers driving seasonal N2O emissions has ultimately been resolved. We identified nitrifier denitrification as the dominant N2O-producing pathway and dissolved O2 as the prime operational parameter, paving the way to the design and fostering of robust emission control strategies. This work exemplifies the untapped potential of multi-meta-omics in the mechanistic understanding and ecological engineering of microbiomes towards reducing anthropogenic impacts and advancing sustainable biotechnological developments. ...

Circling in on plasmids

Benchmarking plasmid detection and reconstruction tools for short-read data from diverse species

Journal article (2025) - Marco Teixeira, Célia Souque, Colin J. Worby, Terrance Shea, Nicoletta Commins, Joshua T. Smith, Arjun M. Miklos, Thomas Abeel, Ashlee M. Earl, Abigail L. Manson

The ability to detect and reconstruct plasmids from genome assemblies is crucial for studying the evolution and spread of antimicrobial resistance and virulence in bacteria. Though long-read sequencing technologies have made reconstructing plasmids easier, most (97%) of the bacterial genome assemblies in the public domain are generated from short-read data. Work to compare plasmid reconstruction tools has focused primarily on Escherichia coli, leaving gaps in our understanding of how well these tools perform on other, less well-characterized, taxa. Using high-quality assemblies as ground truth, we benchmarked 12 plasmid detection tools (which identify plasmid contigs in assemblies) and four plasmid reconstruction tools (which group contigs from the same plasmid together). We tested their ability to characterize diverse plasmids from short-read assemblies representing a wide range of Enterobacterales and Enterococcus species, including newly discovered and poorly characterized species collected from nonhuman hosts. Plasmer, PlasmidEC, PlaScope, and gplas2 were the highest-scoring plasmid detection tools, performing well for both Enterobacterales and enterococci. The two major determinants of accurate plasmid detection were representation in plasmid databases—with Enterobacterales plasmids being more easily detected than those from enterococci—and assembly contiguity, which was also key for successful plasmid reconstruction. Gplas2 performed best for plasmid reconstruction; however, less than half of plasmids were perfectly reconstructed, suggesting that substantial room for improvement remains in this class of tools. ...

Performance and interaction assessment of neural network architectures and bivariate smart predict-then-optimize

Journal article (2025) - Junhan Wen, Thomas Abeel, Mathijs de Weerdt

Smart “predict, then optimize” (SPO) (Elmachtoub in Manag Sci 68(1): 9–26, 2022) is an end-to-end learning strategy for models that predict parameters in optimization problems. Unlike minimizing mean squared error (MSE) which cares about prediction accuracies, SPO aims to ensure that predictions lead to the best possible decisions. The associated loss function, termed SPO loss, measures the decision’s regret from optimal outcomes with parameter realizations. Existing literature has demonstrated the viability of SPO, however, these studies often focus on classical optimization problems and employ a limited set of models for benchmarking. In this study, we tackled a decision-making task inspired by real-world challenges across a wide range of neural network models. Unlike classical problems, our task requires a unique approach: collaboratively training two models to predict different variables. On top of that, one of the decision variables also affects the feasibility of the decisions, further increasing the complexity. While our implementation validates the benefits of SPO, we were surprised to find that models trained exclusively on SPO loss do not consistently attain the minimum regret. Our further investigation into hyperparameters illustrates that the well-tuned models learned very similar patterns from the feature set, irrespective of whether MSE or SPO loss was used. In other words, the change from MSE to SPO loss in training primarily affected the layer biases. Therefore, to improve the learning efficacy with SPO loss, we propose prioritizing learning feature patterns as the fundamental step. Possible strategies include using specialized neural network layers to capture deeper patterns more effectively or simply warming up by training with MSE. Specifically, a warming-up process is particularly advantageous for model(s) where the outputs are closely tied to constraints, as their prediction accuracy significantly impacts the decision feasibility. The insights are investigated empirically through two real-world trading scenarios. By leveraging datasets with diverse properties, we demonstrate the novelty and generalizability of our investigation. ...

Smart “predict, then optimize” (SPO) (Elmachtoub in Manag Sci 68(1): 9–26, 2022) is an end-to-end learning strategy for models that predict parameters in optimization problems. Unlike minimizing mean squared error (MSE) which cares about prediction accuracies, SPO aims to ensure that predictions lead to the best possible decisions. The associated loss function, termed SPO loss, measures the decision’s regret from optimal outcomes with parameter realizations. Existing literature has demonstrated the viability of SPO, however, these studies often focus on classical optimization problems and employ a limited set of models for benchmarking. In this study, we tackled a decision-making task inspired by real-world challenges across a wide range of neural network models. Unlike classical problems, our task requires a unique approach: collaboratively training two models to predict different variables. On top of that, one of the decision variables also affects the feasibility of the decisions, further increasing the complexity. While our implementation validates the benefits of SPO, we were surprised to find that models trained exclusively on SPO loss do not consistently attain the minimum regret. Our further investigation into hyperparameters illustrates that the well-tuned models learned very similar patterns from the feature set, irrespective of whether MSE or SPO loss was used. In other words, the change from MSE to SPO loss in training primarily affected the layer biases. Therefore, to improve the learning efficacy with SPO loss, we propose prioritizing learning feature patterns as the fundamental step. Possible strategies include using specialized neural network layers to capture deeper patterns more effectively or simply warming up by training with MSE. Specifically, a warming-up process is particularly advantageous for model(s) where the outputs are closely tied to constraints, as their prediction accuracy significantly impacts the decision feasibility. The insights are investigated empirically through two real-world trading scenarios. By leveraging datasets with diverse properties, we demonstrate the novelty and generalizability of our investigation.

Fast and exact gap-affine partial order alignment with POASTA

Journal article (2025) - Lucas R. van Dijk, Abigail L. Manson , Ashlee M. Earl, Kiran V. Garimella, Thomas Abeel

Motivation
Partial order alignment is a widely used method for computing multiple sequence alignments, with applications in genome assembly and pangenomics, among many others. Current algorithms to compute the optimal, gap-affine partial order alignment do not scale well to larger graphs and sequences. While heuristic approaches exist, they do not guarantee optimal alignment and sacrifice alignment accuracy.

Results
We present POASTA, a new optimal algorithm for partial order alignment that exploits long stretches of matching sequence between the graph and a query. We benchmarked POASTA against the state-of-the-art on several diverse bacterial gene datasets and demonstrated an average speed-up of 4.1x and up to 9.8x, using less memory. POASTA’s memory scaling characteristics enabled the construction of much larger POA graphs than previously possible, as demonstrated by megabase-length alignments of 342 Mycobacterium tuberculosis sequences. ...

Jaxkineticmodel

Neural ordinary differential equations inspired parameterization of kinetic models

Journal article (2025) - Paul van Lent, Olga Bunkova, Bálint Magyar, Léon Planken, Joep Schmitz, Thomas Abeel

Motivation: Metabolic kinetic models are widely used to model biological systems. Despite their widespread use, it remains challenging to parameterize these Ordinary Differential Equations (ODE) for large scale kinetic models. Recent work on neural ODEs has shown the potential for modeling time-series data using neural networks, and many methodological developments in this field can similarly be applied to kinetic models. Results: We have implemented a simulation and training framework for Systems Biology Markup Language (SBML) models using JAX/Diffrax, which we named jaxkineticmodel. JAX allows for automatic differentiation and just-in-time compilation capabilities to speed up the parameterization of kinetic models, while also allowing for hybridizing kinetic models with neural networks. We show the robust capabilities of training kinetic models using this framework on a large collection of SBML models with different degrees of prior information on parameter initialization. We furthermore showcase the training framework implementation on a complex model of glycolysis. Finally, we show an example of hybridizing kinetic model with a neural network if a reaction mechanism is unknown. These results show that our framework can be used to fit large metabolic kinetic models efficiently and provides a strong platform for modeling biological systems. Implementation: Implementation of jaxkineticmodel is available as a Python package at https://github.com/AbeelLab/jaxkineticmodel. ...

Metagenomic analysis of antibiotic resistance across the wastewater process

Journal article (2025) - Stephanie Pillay, Ramin Shirali Hossein Zade, Paul van Lent, David Calderón-Franco, Thomas Abeel

Bacterial resistance to antimicrobials is a global health threat. Within the One Health context, water from regions with high antibiotic usage, such as clinical and urban areas, collects at wastewater treatment plants (WWTPs). In the WWTP, the activated sludge becomes a complex environment where various antimicrobials and microorganisms converge. While significant research has focused on the influent, activated sludge, and effluent, upstream and downstream sectors around the WWTP are often neglected. We conducted a systematic analysis using five publicly available metagenomic datasets (n=164) from different WWTP sectors and adjacent freshwater systems: upstream (n=14), influent (n=14), activated sludge (n=109), effluent (n=14), and downstream (n=13) to identify and characterise the microbiome, resistome, and mobilome. Opportunistic pathogenic bacteria, such as Pseudomonas, Aeromonas, and Acidovorax, were found in all WWTP sectors, with abundances exceeding 9% in the influent. ESKAPE pathogens, including Klebsiella pneumoniae and Enterobacter species, were identified in the effluent with abundances over 1%. We detected 230 antibiotic resistance genes (ARGs) throughout the WWTP. FTU and CKO β-lactamase gene families dominated the upstream, effluent, and downstream sectors, while the OXA β-lactamase gene family was highly abundant in the influent and activated sludge. ARGs, such as the OXA β-lactamase gene family, were linked to plasmids. Class-1 integrons, associated with the sul1 gene, a marker for anthropogenic pollution, were prevalent in the effluent and downstream sectors. Integrative elements (ICEclc, Tn4371, and PGI2), linked to ARGs, were identified in all sectors, increasing AMR dissemination. These integrative elements conferred resistance to antibiotics, including sulfonamides, tetracyclines and carbapenems. Our findings highlight the presence of ARGs and mobile genetic elements in WWTPs and nearby freshwater systems, raising concerns about AMR transmission to humans, animals, and the environment. This study emphasises the need for effective AMR monitoring and strategies in wastewater treatment to protect public and environmental health. ...

Bacterial resistance to antimicrobials is a global health threat. Within the One Health context, water from regions with high antibiotic usage, such as clinical and urban areas, collects at wastewater treatment plants (WWTPs). In the WWTP, the activated sludge becomes a complex environment where various antimicrobials and microorganisms converge. While significant research has focused on the influent, activated sludge, and effluent, upstream and downstream sectors around the WWTP are often neglected. We conducted a systematic analysis using five publicly available metagenomic datasets (n=164) from different WWTP sectors and adjacent freshwater systems: upstream (n=14), influent (n=14), activated sludge (n=109), effluent (n=14), and downstream (n=13) to identify and characterise the microbiome, resistome, and mobilome. Opportunistic pathogenic bacteria, such as Pseudomonas, Aeromonas, and Acidovorax, were found in all WWTP sectors, with abundances exceeding 9% in the influent. ESKAPE pathogens, including Klebsiella pneumoniae and Enterobacter species, were identified in the effluent with abundances over 1%. We detected 230 antibiotic resistance genes (ARGs) throughout the WWTP. FTU and CKO β-lactamase gene families dominated the upstream, effluent, and downstream sectors, while the OXA β-lactamase gene family was highly abundant in the influent and activated sludge. ARGs, such as the OXA β-lactamase gene family, were linked to plasmids. Class-1 integrons, associated with the sul1 gene, a marker for anthropogenic pollution, were prevalent in the effluent and downstream sectors. Integrative elements (ICEclc, Tn4371, and PGI2), linked to ARGs, were identified in all sectors, increasing AMR dissemination. These integrative elements conferred resistance to antibiotics, including sulfonamides, tetracyclines and carbapenems. Our findings highlight the presence of ARGs and mobile genetic elements in WWTPs and nearby freshwater systems, raising concerns about AMR transmission to humans, animals, and the environment. This study emphasises the need for effective AMR monitoring and strategies in wastewater treatment to protect public and environmental health.

The Growing Strawberries Dataset

Tracking Multiple Objects with Biological Development over an Extended Period

Conference paper (2024) - Junhan Wen, Camiel R. Verschoor, Chengming Feng, Irina Mona Epure, Thomas Abeel, Mathijs De Weerdt

Multiple Object Tracking (MOT) is a rapidly developing research field that targets precise and reliable tracking of objects. Unfortunately, most available MOT datasets typically contain short video clips only, disregarding the indispensable requirement for adequately capturing substantial long-term variations in real-world scenarios. Long-term MOT poses unique challenges due to changes in both the objects and the environment, which remain relatively unexplored. To fill the gap, we propose a time-lapse image dataset inspired by the growth monitoring of strawberries, dubbed The Growing Strawberries Dataset (GSD). The data was captured hourly by six cameras, covering a span of 16 months in 2021 and 2022. During this time, it encompassed a total of 24 plants in two separate greenhouses. The changes in appearance, weight, and position during the ripening process, along with variations in the illumination during data collection, distinguish the task from previous MOT research. These practical issues resulted in a drastic performance downgrade in the track identification and association tasks of state-of-the-art MOT algorithms. We believe The Growing Strawberries will provide a platform for evaluating such long-term MOT tasks and inspire future research. The dataset is available at https://doi.org/10.4121/e3b31ece-cc88-4638-be10-8ccdd4c5f2f7.v1. ...

Global diversity of enterococci and description of 18 previously unknown species

Journal article (2024) - Julia A. Schwartzman, Francois Lebreton, Rauf Salamzade, Terrance Shea, Melissa J. Martin, Katharina Schaufler, Aysun Urhan, Thomas Abeel, Ilana L.B.C. Camargo, More authors...

Enterococci are gut microbes of most land animals. Likely appearing first in the guts of arthropods as they moved onto land, they diversified over hundreds of millions of years adapting to evolving hosts and host diets. Over 60 enterococcal species are now known. Two species, Enterococcus faecalis and Enterococcus faecium, are common constituents of the human microbiome. They are also now leading causes of multidrug-resistant hospital-associated infection. The basis for host association of enterococcal species is unknown. To begin identifying traits that drive host association, we collected 886 enterococcal strains from widely diverse hosts, ecologies, and geographies. This identified 18 previously undescribed species expanding genus diversity by >25%. These species harbor diverse genes including toxins and systems for detoxification and resource acquisition. Enterococcus faecalis and E. faecium were isolated from diverse hosts highlighting their generalist properties. Most other species showed a more restricted distribution indicative of specialized host association. The expanded species diversity permitted the Enterococcus genus phylogeny to be viewed with unprecedented resolution, allowing features to be identified that distinguish its four deeply rooted clades, and the entry of genes associated with range expansion such as B-vitamin biosynthesis and flagellar motility to be mapped to the phylogeny. This work provides an unprecedentedly broad and deep view of the genus Enterococcus, including insights into its evolution, potential new threats to human health, and where substantial additional enterococcal diversity is likely to be found. ...

Integrated omics of Saccharomyces cerevisiae CENPK2-1C reveals pleiotropic drug resistance and lipidomic adaptations to cannabidiol

Journal article (2024) - Erin Noel Jordan, Ramin Shirali Hossein Zade, Stephanie Pillay, Paul van Lent, Thomas Abeel, Oliver Kayser

Yeast metabolism can be engineered to produce xenobiotic compounds, such as cannabinoids, the principal isoprenoids of the plant Cannabis sativa, through heterologous metabolic pathways. However, yeast cell factories continue to have low cannabinoid production. This study employed an integrated omics approach to investigate the physiological effects of cannabidiol on S. cerevisiae CENPK2-1C yeast cultures. We treated the experimental group with 0.5 mM CBD and monitored CENPK2-1C cultures. We observed a latent-stationary phase post-diauxic shift in the experimental group and harvested samples in the inflection point of this growth phase for transcriptomic and metabolomic analysis. We compared the transcriptomes of the CBD-treated yeast and the positive control, identifying eight significantly overexpressed genes with a log fold change of at least 1.5 and a significant adjusted p-value. Three notable genes were PDR5 (an ABC-steroid and cation transporter), CIS1, and YGR035C. These genes are all regulated by pleiotropic drug resistance linked promoters. Knockout and rescue of PDR5 showed that it is a causal factor in the post-diauxic shift phenotype. Metabolomic analysis revealed 48 significant spectra associated with CBD-fed cell pellets, 20 of which were identifiable as non-CBD compounds, including fatty acids, glycerophospholipids, and phosphate-salvage indicators. Our results suggest that mitochondrial regulation and lipidomic remodeling play a role in yeast’s response to CBD, which are employed in tandem with pleiotropic drug resistance (PDR). We conclude that bioengineers should account for off-target product C-flux, energy use from ABC-transport, and post-stationary phase cell growth when developing cannabinoid-biosynthetic yeast strains. ...

Aerobic denitrification as an N₂O source from microbial communities

Journal article (2024) - Nina Roothans, Minke Gabriëls, Thomas Abeel, Martin Pabst, Mark C.M. van Loosdrecht, Michele Laureni

Nitrous oxide (N₂O) is a potent greenhouse gas of primarily microbial origin. Oxic and anoxic emissions are commonly ascribed to autotrophic nitrification and heterotrophic denitrification, respectively. Beyond this established dichotomy, we quantitatively show that heterotrophic denitrification can significantly contribute to aerobic nitrogen turnover and N₂O emissions in complex microbiomes exposed to frequent oxic/anoxic transitions. Two planktonic, nitrification-inhibited enrichment cultures were established under continuous organic carbon and nitrate feeding, and cyclic oxygen availability. Over a third of the influent organic substrate was respired with nitrate as electron acceptor at high oxygen concentrations (>6.5 mg/L). N₂O accounted for up to one-quarter of the nitrate reduced under oxic conditions. The enriched microorganisms maintained a constitutive abundance of denitrifying enzymes due to the oxic/anoxic frequencies exceeding their protein turnover—a common scenario in natural and engineered ecosystems. The aerobic denitrification rates are ascribed primarily to the residual activity of anaerobically synthesised enzymes. From an ecological perspective, the selection of organisms capable of sustaining significant denitrifying activity during aeration shows their competitive advantage over other heterotrophs under varying oxygen availabilities. Ultimately, we propose that the contribution of heterotrophic denitrification to aerobic nitrogen turnover and N₂O emissions is currently underestimated in dynamic environments. ...

Aerobic denitrification as N2O source in microbial communities

Journal article (2024) - Nina Roothans, Minke Gabriëls, T.E.P.M.F. Abeel, Martin Pabst, Mark C.M. van Loosdrecht, Michele Laureni

Nitrous oxide (N2O) is a potent greenhouse gas of primarily microbial origin. Aerobic and anoxic emissions are commonly ascribed to nitrification and denitrification, respectively. Beyond this established dichotomy, we quantitatively prove that heterotrophic denitrification can significantly contribute to aerobic nitrogen turnover and N2O emissions in complex microbiomes exposed to frequent oxic/anoxic transitions. Planktonic, nitrification-inhibited denitrifying enrichments respired over a third of the influent organic substrate with nitrate at high oxygen concentrations. N2O accounted for up to one quarter of the aerobically respired nitrate. The constitutive detection of all denitrification enzymes in both anoxic and oxic periods highlight the selective advantage offered by metabolic preparedness in dynamic environments. We posit that aerobic denitrification and associated N2O formation is currently underestimated in dynamic microbial ecosystems. ...

SAFPred

Synteny-aware gene function prediction for bacteria using protein embeddings

Journal article (2024) - Aysun Urhan, Bianca-Maria Cosma, Ashlee M. Earl, Abigail L. Manson, Thomas Abeel

Motivation: Today, we know the function of only a small fraction of the protein sequences predicted from genomic data. This problem is even more salient for bacteria, which represent some of the most phylogenetically and metabolically diverse taxa on Earth. This low rate of bacterial gene annotation is compounded by the fact that most function prediction algorithms have focused on eukaryotes, and conventional annotation approaches rely on the presence of similar sequences in existing databases. However, often there are no such sequences for novel bacterial proteins. Thus, we need improved gene function prediction methods tailored for bacteria. Recently, transformer-based language models - adopted from the natural language processing field - have been used to obtain new representations of proteins, to replace amino acid sequences. These representations, referred to as protein embeddings, have shown promise for improving annotation of eukaryotes, but there have been only limited applications on bacterial genomes. Results: To predict gene functions in bacteria, we developed SAFPred, a novel synteny-aware gene function prediction tool based on protein embeddings from state-of-the-art protein language models. SAFpred also leverages the unique operon structure of bacteria through conserved synteny. SAFPred outperformed both conventional sequence-based annotation methods and state-of-the-art methods on multiple bacterial species, including for distant homolog detection, where the sequence similarity to the proteins in the training set was as low as 40%. Using SAFPred to identify gene functions across diverse enterococci, of which some species are major clinical threats, we identified 11 previously unrecognized putative novel toxins, with potential significance to human and animal health. ...

Effects of antibiotic growth promoter and its natural alternative on poultry cecum ecosystem: an integrated analysis of gut microbiota and host expression

Journal article (2024) - C. Peng, Mahdi Ghanbari, Ali May, T.E.P.M.F. Abeel

Background: In-feed antibiotic growth promoters (AGPs) have been a cornerstone in the livestock industry due to their role in enhancing growth and feed efficiency. However, concerns over antibiotic resistance have driven a shift away from AGPs toward natural alternatives. Despite the widespread use, the exact mechanisms of AGPs and alternatives are not fully understood. This necessitates holistic studies that investigate microbiota dynamics, host responses, and the interactions between these elements in the context of AGPs and alternative feed additives. Methods: In this study, we conducted a multifaceted investigation of how Bacitracin, a common AGP, and a natural alternative impact both cecum microbiota and host expression in chickens. In addition to univariate and static differential abundance and expression analyses, we employed multivariate and time-course analyses to study this problem. To reveal host-microbe interactions, we assessed their overall correspondence and identified treatment-specific pairs of species and host expressed genes that showed significant correlations over time. Results: Our analysis revealed that factors such as developmental age substantially impacted the cecum ecosystem more than feed additives. While feed additives significantly altered microbial compositions in the later stages, they did not significantly affect overall host gene expression. The differential expression indicated that with AGP administration, host transmembrane transporters and metallopeptidase activities were upregulated around day 21. Together with the modulated kininogen binding and phenylpyruvate tautomerase activity over time, this likely contributes to the growth-promoting effects of AGPs. The difference in responses between AGP and PFA supplementation suggests that these additives operate through distinct mechanisms. Conclusion: We investigated the impact of a common AGP and its natural alternative on poultry cecum ecosystem through an integrated analysis of both the microbiota and host responses. We found that AGP appears to enhance host nutrient utilization and modulate immune responses. The insights we gained are critical for identifying and developing effective AGP alternatives to advance sustainable livestock farming practices. ...

Background: In-feed antibiotic growth promoters (AGPs) have been a cornerstone in the livestock industry due to their role in enhancing growth and feed efficiency. However, concerns over antibiotic resistance have driven a shift away from AGPs toward natural alternatives. Despite the widespread use, the exact mechanisms of AGPs and alternatives are not fully understood. This necessitates holistic studies that investigate microbiota dynamics, host responses, and the interactions between these elements in the context of AGPs and alternative feed additives. Methods: In this study, we conducted a multifaceted investigation of how Bacitracin, a common AGP, and a natural alternative impact both cecum microbiota and host expression in chickens. In addition to univariate and static differential abundance and expression analyses, we employed multivariate and time-course analyses to study this problem. To reveal host-microbe interactions, we assessed their overall correspondence and identified treatment-specific pairs of species and host expressed genes that showed significant correlations over time. Results: Our analysis revealed that factors such as developmental age substantially impacted the cecum ecosystem more than feed additives. While feed additives significantly altered microbial compositions in the later stages, they did not significantly affect overall host gene expression. The differential expression indicated that with AGP administration, host transmembrane transporters and metallopeptidase activities were upregulated around day 21. Together with the modulated kininogen binding and phenylpyruvate tautomerase activity over time, this likely contributes to the growth-promoting effects of AGPs. The difference in responses between AGP and PFA supplementation suggests that these additives operate through distinct mechanisms. Conclusion: We investigated the impact of a common AGP and its natural alternative on poultry cecum ecosystem through an integrated analysis of both the microbiota and host responses. We found that AGP appears to enhance host nutrient utilization and modulate immune responses. The insights we gained are critical for identifying and developing effective AGP alternatives to advance sustainable livestock farming practices.

Simulated Design-Build-Test-Learn Cycles for Consistent Comparison of Machine Learning Methods in Metabolic Engineering

Journal article (2023) - Paul van Lent, Joep Schmitz, Thomas Abeel

Combinatorial pathway optimization is an important tool in metabolic flux optimization. Simultaneous optimization of a large number of pathway genes often leads to combinatorial explosions. Strain optimization is therefore often performed using iterative design-build-test-learn (DBTL) cycles. The aim of these cycles is to develop a product strain iteratively, every time incorporating learning from the previous cycle. Machine learning methods provide a potentially powerful tool to learn from data and propose new designs for the next DBTL cycle. However, due to the lack of a framework for consistently testing the performance of machine learning methods over multiple DBTL cycles, evaluating the effectiveness of these methods remains a challenge. In this work, we propose a mechanistic kinetic model-based framework to test and optimize machine learning for iterative combinatorial pathway optimization. Using this framework, we show that gradient boosting and random forest models outperform the other tested methods in the low-data regime. We demonstrate that these methods are robust for training set biases and experimental noise. Finally, we introduce an algorithm for recommending new designs using machine learning model predictions. We show that when the number of strains to be built is limited, starting with a large initial DBTL cycle is favorable over building the same number of strains for every cycle. ...

Pan-genome de Bruijn graph using the bidirectional FM-index

Journal article (2023) - Lore Depuydt, Luca Renders, Thomas Abeel, Jan Fostier

Background: Pan-genome graphs are gaining importance in the field of bioinformatics as data structures to represent and jointly analyze multiple genomes. Compacted de Bruijn graphs are inherently suited for this purpose, as their graph topology naturally reveals similarity and divergence within the pan-genome. Most state-of-the-art pan-genome graphs are represented explicitly in terms of nodes and edges. Recently, an alternative, implicit graph representation was proposed that builds directly upon the unidirectional FM-index. As such, a memory-efficient graph data structure is obtained that inherits the FM-index’ backward search functionality. However, this representation suffers from a number of shortcomings in terms of functionality and algorithmic performance. Results: We present a data structure for a pan-genome, compacted de Bruijn graph that aims to address these shortcomings. It is built on the bidirectional FM-index, extending the ability of its unidirectional counterpart to navigate and search the graph in both directions. All basic graph navigation steps can be performed in constant time. Based on these features, we implement subgraph visualization as well as lossless approximate pattern matching to the graph using search schemes. We demonstrate that we can retrieve all occurrences corresponding to a read within a certain edit distance in a very efficient manner. Through a case study, we show the potential of exploiting the information embedded in the graph’s topology through visualization and sequence alignment. Conclusions: We propose a memory-efficient representation of the pan-genome graph that supports subgraph visualization and lossless approximate pattern matching of reads against the graph using search schemes. The C++ source code of our software, called Nexus, is available at https://github.com/biointec/nexus under AGPL-3.0 license. ...

SHIP

Identifying antimicrobial resistance gene transfer between plasmids

Journal article (2023) - Marco Teixeira, Stephanie Pillay, Aysun Urhan, Thomas Abeel

Motivation: Plasmids are carriers for antimicrobial resistance (AMR) genes and can exchange genetic material with other structures, contributing to the spread of AMR. There is no reliable approach to identify the transfer of AMR genes across plasmids. This is mainly due to the absence of a method to assess the phylogenetic distance of plasmids, as they show large DNA sequence variability. Identifying and quantifying such transfer can provide novel insight into the role of small mobile elements and resistant plasmid regions in the spread of AMR. Results: We developed SHIP, a novel method to quantify plasmid similarity based on the dynamics of plasmid evolution. This allowed us to find conserved fragments containing AMR genes in structurally different and phylogenetically distant plasmids, which is evidence for lateral transfer. Our results show that regions carrying AMR genes are highly mobilizable between plasmids through transposons, integrons, and recombination events, and contribute to the spread of AMR. Identified transferred fragments include a multi-resistant complex class 1 integron in Escherichia coli and Klebsiella pneumoniae, and a region encoding tetracycline resistance transferred through recombination in Enterococcus faecalis. ...