Random Forest Modelling of Surface Ammonia Concentrations in the Netherlands

More Info
expand_more

Abstract

The emission of reactive nitrogen species increased rapidly in the twentieth century causing a significant perturbation of the nitrogen cycle. This causes many detrimental effects for both humans and ecosystems such as eutrophication, acidification and biodiversity loss. Ammonia, which is a reactive nitrogen species is difficult to quantify because the compound is hard to observe with measurement equipment such as ground-based measurement stations and remote sensing instruments such as CrIS. Therefore, this thesis aims to improve the monitoring of surface ammonia concentrations with a machine learning technique called random forests (RF). These type of models can detect complex and non-linear relationships between variables and are frequently used in other air pollution studies. In this study, a RF model has been built to estimate the ammonia surface concentration with vertical column density (VCD) datasets, meteorological variables and land-use variables. Different combinations of VCD datasets (either modelled or CrIS VCD data) have been used to train the model and to predict the surface ammonia concentration. The result of the study is that RF models are statistically more accurate at estimating the surface ammonia concentration than the LOTOS-EUROS model when validated by ground-based measurements stations from the MAN and LML network. Especially trained RF models that have been supplied with CrIS VCD data during the prediction phase show strong performance. Moreover, when comparing the RF model that has been trained without VCD data to the ‘complete’ RF models, the complete RF models show better performance, proving the added benefit of incorporating satellite data in RF modelling. Recommendations for further research include performing similar experiments with other input variables and other machine learning algorithms, validating the performance of the RF model in different years and considering using other datasets as the ground-truth variable for RF models – such as ground-based measurement data from MAN and LML.