A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer

High grade vs. low grade

Journal Article (2021)
Author(s)

J.M. Castillo (Erasmus MC)

M.P.A. Starmans (Erasmus MC)

Muhammad Arif (Erasmus MC)

WJ Niessen (TU Delft - ImPhys/Medical Imaging, TU Delft - ImPhys/Computational Imaging, Erasmus MC)

Stefan C. Klein (Erasmus MC)

Chris H. Bangma (Erasmus MC)

Ivo G. Schoots (Erasmus MC)

J.F. Veenland (Erasmus MC)

Research Group
ImPhys/Medical Imaging
Copyright
© 2021 J.M. Castillo, M.P.A. Starmans, M. Arif, W.J. Niessen, Stefan Klein, Chris H. Bangma, Ivo G. Schoots, J.F. Veenland
DOI related publication
https://doi.org/10.3390/diagnostics11020369
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 J.M. Castillo, M.P.A. Starmans, M. Arif, W.J. Niessen, Stefan Klein, Chris H. Bangma, Ivo G. Schoots, J.F. Veenland
Research Group
ImPhys/Medical Imaging
Issue number
2
Volume number
11
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Radiomics applied in MRI has shown promising results in classifying prostate cancer lesions. However, many papers describe single-center studies without external validation. The issues of using radiomics models on unseen data have not yet been sufficiently addressed. The aim of this study is to evaluate the generalizability of radiomics models for prostate cancer classification and to compare the performance of these models to the performance of radiologists. Multiparametric MRI, photographs and histology of radical prostatectomy specimens, and pathology reports of 107 patients were obtained from three healthcare centers in the Netherlands. By spatially correlating the MRI with histology, 204 lesions were identified. For each lesion, radiomics features were extracted from the MRI data. Radiomics models for discriminating high-grade (Gleason score ≥ 7) versus low-grade lesions were automatically generated using open-source machine learning software. The performance was tested both in a single-center setting through cross-validation and in a multi-center setting using the two unseen datasets as external validation. For comparison with clinical practice, a multi-center classifier was tested and compared with the Prostate Imaging Reporting and Data System version 2 (PIRADS v2) scoring performed by two expert radiologists. The three single-center models obtained a mean AUC of 0.75, which decreased to 0.54 when the model was applied to the external data, the radiologists obtained a mean AUC of 0.46. In the multi-center setting, the radiomics model obtained a mean AUC of 0.75 while the radiologists obtained a mean AUC of 0.47 on the same subset. While radiomics models have a decent performance when tested on data from the same center(s), they may show a significant drop in performance when applied to external data. On a multi-center dataset our radiomics model outperformed the radiologists, and thus, may represent a more accurate alternative for malignancy prediction.