A Novel Machine Learning Framework for Advanced Driving Force Analysis of Individuals' Dietary Water Footprint

Journal Article (2025)
Author(s)

Kai Huang (Beijing Forestry University)

D. Wang (TU Delft - Sanitary Engineering)

Z. Kapelan (TU Delft - Water Systems Engineering)

Research Group
Water Systems Engineering
DOI related publication
https://doi.org/10.1029/ 2024EF005061
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Water Systems Engineering
Issue number
12
Volume number
13
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Abstract
Addressing water scarcity requires significant attention to reducing water footprint (WF) related to food consumption. Since individuals' dietary behavior is largely influenced by their demographic and anthropometric attributes, it is crucial to identify individuals who have a high dietary WF and prioritize them as the focus of policies. Several studies analyzing the driving factors behind dietary WF exist but have multiple limitations. These include the statistical models with rather modest performances, lack of rigorous sensitivity analysis/feature importance (FI) analysis, and lack of generalization ability. Here, we developed a novel ML-based framework for analyzing the driving forces behind dietary WF. The framework incorporated three machine learning (ML) models (Extra-Trees (ET), Histogram-based Gradient Boosting (HGB), and eXtreme Gradient Boosting (XGB)) and an ML explanation approach Shapley Additive exPlanations (SHAP). This framework was applied to a case study on Chinese inhabitants. The derived results validated the proposed framework and demonstrated ML's superiority over conventional statistical methods. XGB was identified as the optimal model as it effectively captured the variability in the data and showed good generalization performance. The FI analysis for XGB revealed the most influential features on dietary WF, with income level, urbanization level, education level, and gender emerging as the top four features in descending order. Through the subsequent SHAP dependence analysis, the priority groups for dietary WF reduction interventions were identified as high-income residents, urban residents, highly educated residents, and male residents. In light of these findings and their underlying causes, the paper concluded with a set of policy recommendations.

Plain Language Summary
Addressing water scarcity requires significant attention to reducing water footprint (WF) related to food consumption. People's diets are largely influenced by their age, gender, and income level, etc. So, it's important to identify people who have a high dietary WF and focus on them when creating policies to reduce dietary WF. There are already some studies that look at the factors behind dietary WF, but they have a few limitations. These include the development of statistical models with rather modest performances, lack of rigorous sensitivity analysis/feature importance (FI) analysis, and lack of generalization ability. Motivated by these gaps, a new machine learning-based framework was developed for analyzing the driving forces behind dietary WF and then was applied to a case study on Chinese inhabitants. The results show that the top four factors associated with dietary WF are income level, urbanization level, education level, and gender, in that order. The priority groups for dietary WF reduction interventions were identified as high-income, urban, highly educated, and male residents. In light of these findings and their underlying causes, the paper concluded with a set of policy recommendations.