Arsenic contamination in groundwater is a major public health concern in the Ganges-Brahmaputra Basin, where millions rely on shallow aaquifers for drinking water. Naturally occurring arsenic is mobilised under specific sedimentological and geochemical conditions, particularly in
...
Arsenic contamination in groundwater is a major public health concern in the Ganges-Brahmaputra Basin, where millions rely on shallow aaquifers for drinking water. Naturally occurring arsenic is mobilised under specific sedimentological and geochemical conditions, particularly in Holocene alluvial deposits. Although extensively studied, arsenic distribution remains highly variable and difficult to predict. This study investigates how geomorphological features, specifically oxbow lakes and point bars, can be used to improve arsenic risk prediction and mapping using machine learning. The approach offers a targeted and scalable method for identifying high-risk zones, particularly in data-scarce environments. The divergence between theoretical assumptions and dataset trends illustrates the challenges of generalising risk models without high-precision, ground-validated input data. As a proof of concept, a two-stage workflow was implemented. In the first stage, a You Only Look Once object detection model was trained to locate oxbow lakes and point bars using satellite imagery. These landforms are key indicators of arsenic-prone zones due to their depositional history. The model performed well on well-isolated oxbow lakes and their associated point bars but struggled with hydrologically connected oxbow lakes and heavily vegetated areas, highlighting the need for more diverse training data and the potential value of false-colour imagery. A case study was conducted using historical arsenic well measurements to evaluate model assumptions. A supervised classification with the eXtreme Gradient Boosting algorithm confirmed the predictive value of geomorphological variables, with sand content, elevation, and soil organic carbon emerging as dominant predictors. Vegetation and precipitation data were excluded due to low relevance and poor temporal alignment. In the second stage, a Gaussian Mixture Model was applied to classify arsenic risk using the same geospatial variables. The model produced spatially coherent and interpretable risk zones, with high probability in most predictions. Areas of low probability were primarily located at transition zones between risk classes, indicating regions where higher-resolution or more precise input data may be necessary to reduce uncertainty and improve model reliability. This study provides a practical and semi-automated framework for geospatial arsenic risk assessment. While the risk classification is relative, future work should incorporate population-weighted exposure metrics to better guide mitigation. The method developed here supports more efficient fieldwork planning and decision-making in complex fluvial environments.