Automatic algorithm selection and hyperparameter optimization for medical image classification

More Info
expand_more

Abstract

Recent years have shown a tremendous increase in the application of Artificial Intelligence to the field of radiology, often through the extraction and analysis of large numbers of quantitative features from medical images. These applications increase the demand for machine learning models to extract information from these images. To provide these models, improve their performance and reduce the time that experts have to spend on manually tuning them, the field of Automated Machine Learning (AutoML) aims to automate the design process of machine learning models by optimizing the selection of algorithms and their hyperparameters for each application. This work applies an AutoML approach to medical image classification, using a Bayesian optimization strategy to automatically optimize the selection of preprocessing and classification algorithms and their hyperparameters. Its performance is compared with the performance of a random search optimization strategy, evaluated on three datasets from three different clinical applications. The results show that the Bayesian optimization and the random search return models that achieve similar performance on the unseen test sets. We show that a random search with relatively few evaluations and a simple ensemble strategy is sufficient to achieve performance comparable to a more sophisticated and more computationally demanding Bayesian optimization approach, therefore validating the use of a random search optimization strategy in this medical image classification setting. All found models generalize poorly, with average F1-scores on the validation sets used for optimizing the models being at least 20\% lower than the average F1-scores on the unseen test sets. Finally, we further emphasize the difficulty to generalize in this setting, by showing that the differences between subsets of the evaluated datasets are large and that increasing the computation time of the optimization does not benefit the test set performance of the final solution.