Trace metal pollution in the Scheldt estuary - a statistical approach to estimate the metal partitioning coefficient for a suite of metals

More Info
expand_more

Abstract

Trace metals appear in estuarine systems in two forms: dissolved and particulate.
Describing the partitioning between these two forms is done by a coefficient, K_d, which relies on a number of environmental parameters, such as the salinity of the water and the seasonally dependent biological activity. Although this coefficient is known to follow a log-normal distribution, models describing estuarine metal dynamics usually simply use average values.
In this study an attempt has been made to create a statistical model of K_d in the Scheldt estuary for a number of trace metals, based on data of some environmental parameters. This model should be able to cover the whole spectrum of K_d values. The used parameters are salinity, the suspended particulate matter concentration, the total metal concentration and the year of measurement. A comparison is made between two linear regression based models, principal components regression and partial least squares regression, and two decision tree based models, random forest and gradient boosting machine.
Although cross-validation performance on the training set is promising, with the decision tree based models clearly outperforming the linear regression based ones, predictions on an independent test set are very poor for all metals except for cadmium. Cadmium is an exception, because its estuarine dynamics are mainly governed by a specific process, that is driven by changes in salinity.
The dynamics of all other metals depend on both the biochemical and the physical characteristics of the estuary. In the proposed model the parameter 'salinity' has to account for both of these characteristics simultaneously, which is inherently impossible.
Inclusion of pH and or dissolved oxygen content seems promising to create an adequate model. The first, because it is correlated to salinity, and the second because it is representative for the seasonal variation.