Relationships between geo-spatial features and COVID-19 hospitalisations revealed by machine learning models and SHAP values

Journal Article (2024)
Author(s)

Lixia Chu (Wageningen University & Research, Student TU Delft)

Jeroen Nelen (Student TU Delft)

Alessandro Crivellari (National Taiwan University)

Dainius Masiliunas (Wageningen University & Research)

Carola Hein (TU Delft - History, Form & Aesthetics)

Christoph Lofi (TU Delft - Web Information Systems)

Research Group
History, Form & Aesthetics
DOI related publication
https://doi.org/10.1080/17538947.2024.2358851
More Info
expand_more
Publication Year
2024
Language
English
Research Group
History, Form & Aesthetics
Issue number
1
Volume number
17
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Uncovering relationships between geospatial features and COVID-19 features is a comprehensive, confounding, cross-disciplinary and challenging topic, as the spread and effects of COVID-19 are related to many aspects of our lives, including socio-economic, cultural, and environmental features. Our research aims to provide an innovative data-driven method to uncover the relationships between the heterogeneous and cross-disciplinary geospatial features with COVID-19 features at the municipality scale in Germany. We exploit these relationships using supervised machine learning, explainable AI and spatial analysis in Germany from March 2020 to October 2021. First, we integrated multi-source data including social data, economic data, cultural data, air pollution data and COVID-19 features data into one spatiotemporally harmonised dataset. Second, we trained three machine learning models (a Support Vector Regressor, a Random Forest, and a Light Gradient Boosting Machine) on the integrated dataset to learn the relationships between the spatial features and the COVID-19 features. Third, we used Shapley Additive exPlanations (SHAP) to rank the relevance of each feature. After that, we illustrated the results by the visualised spatial differences within municipalities. The output delivers key information regarding the Covid hospitalisation rate with the control of NO2 concentration and education level in Germany with transferable methods.