AILMENT
A novel ML framework for prediction and analysis of microbiota associations in colorectal cancer
N. Strepis (Erasmus MC)
Z. Lu (Wageningen University & Research)
W. de Koning (Erasmus MC)
B. J.M. Rijvers (Erasmus MC)
A. A. de Souza (Erasmus MC)
C. Verhoef (Erasmus MC)
B. Fosso (Università degli Studi di Bari Aldo Moro)
M. Doukas (Erasmus MC)
T. Abeel (Erasmus MC, TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Authors
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Objective Colorectal cancer (CRC) is one of the most common cancers in the world, with research suggesting a potential association with the human microbiota. However, simply comparing relative microbial abundances could overlook connections between microbes and specific clinical characteristics of CRC. Methods Here, we present the machine learning (ML) framework ‘AILMENT’ (AI-linked Microbiota Exploration of Nascent Tumours) that efficiently associates microbiota profiles with CRC metadata. The Random Forest and Extreme Gradient Boosting machine learning methods incorporated in AILMENT were used to identify associations between the microbiota and CRC phenotypes relating to clinical outcomes. Results Sixteen ML models were generated from public data of 778 individuals using AILMENT, indicating associations between the microbiota and several different clinical characteristics of CRC, including microsatellite instability (MSI) and BRAF mutations (median AUROC and F1 scores of the ML models reached up to 0.90 and 0.85, respectively). Additionally, associations between Odoribacter, Leptotrichia, Granulicella, Parvimonas, Fusobacterium and other genera with CRC were observed. With respect to sample type, distinct microbial compositions were observed between tissue and faecal samples, indicating fundamental differences in microbiota composition between these sample types. The AILMENT framework pinpointed an association between pathogens such as Porphyromonas and Parvimonas and CRC, confirming their role as microbial signatures in the disease. Moreover, the framework could indicate microbes linked to a healthy gut distinct from the CRC state, such as the butyrate-producers Lactobacillus, Eubacterium and Ruminococcus. To validate the performance and utility of AILMENT, we applied it to a publicly available dataset of bacterial species abundance and associated metadata, successfully replicating the key findings. Conclusion The AILMENT framework can efficiently predict associations between different clinical characteristics of CRC and complex microbial relative abundance data. AILMENT enables the identification of specific microbes at the genus level for detailed clinical characterisation of CRC, demonstrating its potential as a tool for a better understanding of cancer-microbiota interactions.