T. Deng
Please Note
5 records found
1
Ozone exceedance forecasting with enhanced extreme instance augmentation
A case study in Germany
Accurately forecasting ozone levels that exceed specific thresholds is pivotal for mitigating adverse effects on both the environment and public health. However, predicting such ozone exceedances remains challenging due to the infrequent occurrence of high-concentration ozone data. This research, leveraging data from 57 German monitoring stations from 1999 to 2018, introduces an Enhanced Extreme Instance Augmentation Random Forest (EEIA-RF) approach that significantly improves the prediction of days when the maximum daily 8-hour average ozone concentrations exceed 120μg/m3. A pre-trained machine learning model is used to generate additional high-concentration data, which, combined with selectively reduced low-concentration data, forms a new dataset for training a refined model. This method achieved an improvement of at least 8% in the accuracy of predicting days with ozone exceedances across Germany. Our experiment underscores the approach's value in enhancing atmospheric modeling and supporting public health advisories and environmental policy-making related to ozone pollution.
With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models).
Ground-level ozone is a critical atmospheric pollutant, and high concentrations of ozone can damage human health, affect plant growth and cause ecological harm. Traditional chemical transport models and popular machine learning models have difficulty in predicting ozone concentrations, especially in times with high concentrations. We proposes a clustering-based spatial transfer learning Multilayer Perceptron (SPTL-MLP) to predict ozone concentration at the target observation station for the next three days. We use k-means clustering algorithm to find similar stations and train them together to get a base model for spatial transfer learning. For practical applications, a weighted loss function has been designed with an extra emphasis on reducing prediction errors of high ozone concentrations. Evaluation using historical data of stations in Germany shows that our SPTL-MLP model has a smaller error (reduced by 9.13%) and higher prediction accuracies of ozone exceedances (improved by 8.21% and 16.9%) compared to MLP (without spatial transfer). The results demonstrate the effectiveness of the SPTL-MLP in the short-term ozone forecast. It can be used for timely warning of ozone exceedances and help governments to detect air quality.
Air quality warning and forecasting systems are usually based on numerical chemical transport models (CTMs). Those dynamic models perform predictions by simulating the life cycles of the atmospheric components, including emission, transport and removal. However, the accuracy of these CTMs are still limited because of many imperfections, e.g., uncertainties in the input sources such as emission inventories, wind fields, boundary conditions, as well as insufficient knowledge about the atmospheric dynamics themselves. All these will mislead the CTM prediction constantly, or in a systematic way. In this paper, an approach based on machine learning is applied to predict model bias in the CTM. It is then combined with the CTM for formulating a hybrid forecast system. To our knowledge, it is the first time that machine learning methods are used in this way. The hybrid system is tested on the fine particular matter (PM2.5) prediction in Shanghai, China. The results showed that machine learning can be an effective tool to improve the accuracy of CTM prediction. In case of short term PM2.5 forecast (forecast length less than 12 h), statistical metrics of the root mean square error, mean absolute error, mean absolute percentage error as well as the air quality rank predicted accuracy all show the forecast skill is remarkably improved; while for long term prediction, improvement is not ensured.
Tropospheric ozone is a secondary pollutant which can affect human health and plant growth. In this paper, we investigated transferred convolutional neural network long short-term memory (TL-CNN-LSTM) model to predict ozone concentration. Hourly CNN-LSTM model is used to extract features and predict ozone for next hour, which is superior to commonly used models in previous studies. In the daily ozone prediction model, prediction over a large time-scale requires more data, however, only limited data are available, which causes the CNN-LSTM model to fail to accurately predict. Network-based transfer learning methods based on hourly models can obtain information from smaller temporal resolution. It can reduce prediction errors and shorten run time for model training. However, for extreme cases where the amount of data is severely insufficient, transfer learning based on smaller time scale cannot improve model prediction accuracy.