Jianbing Jin | TU Delft Repository

Observational operator for fair model evaluation with ground NO2 measurements

Journal article (2024) - Li Fang, Jianbing Jin, Arjo Segers, Ke Li, Ji Xia, Wei Han, Baojie Li, H.X. Lin, Lei Zhu, More Authors...

Measurements collected from ground monitoring stations have gained popularity as a valuable data source for evaluating numerical models and correcting model errors through data assimilation. The penalty quantified by simulation minus observations drives both model evaluation and assimilation. However, the penal forces are challenged by the existence of a spatial-scale disparity between model simulations and observations. Chemical transport models (CTMs) divide the atmosphere into grid cells, providing a structured way to simulate atmospheric processes. However, their spatial resolution often does not match the limited coverage of in situ measurements, especially for short-lived air pollutants. Within a broad grid cell, air pollutant concentrations can exhibit significant heterogeneity due to their rapid generation and dissipation. Ground observations with traditional methods (including “nearest search” and “grid mean”) are less representative when compared to model simulations. This study develops a new land-use-based representative (LUBR) observational operator to generate spatially representative gridded observations for model evaluation. It incorporates high-resolution urban–rural land use data to address intra-grid variability. The LUBR operator has been validated to consistently provide insights that align with satellite Ozone Monitoring Instrument (OMI) measurements. It is an effective solution to accurately quantify these spatial-scale mismatches and further resolve them via assimilation. Model evaluations with 2015–2017 NO2 measurements in the study area demonstrate that biases and errors differed substantially when the LUBR and other operators were used, respectively. The results highlight the importance of considering fine-scale urban–rural differences when comparing models and observations, especially for short-lived pollutants like NO2. ...

Valid time shifting ensemble Kalman filter (VTS-EnKF) for dust storm forecasting

Journal article (2024) - M. Pang, Jianbing Jin, Arjo Segers, Huiya Jiang, Wei Han, Batjargal Buyantogtokh, Ji Xia, Li Fang, Hai Xiang Lin, More authors...

Dust storms pose significant risks to health and property, necessitating accurate forecasting for preventive measures. Despite advancements, dust models grapple with uncertainties arising from emission and transport processes. Data assimilation addresses these by integrating observations to rectify model error, enhancing forecast precision. The ensemble Kalman filter (EnKF) is a widely used assimilation algorithm that effectively optimize model states, particularly in terms of intensity adjustment. However, the EnKF's efficacy is challenged by position errors between modeled and observed dust features, especially under substantial position errors. This study introduces the valid time shifting ensemble Kalman filter (VTS-EnKF), which combines stochastic EnKF with a valid time shifting mechanism. By recruiting additional ensemble members from neighboring valid times, this method not only accommodates variations in dust load but also explicitly accounts for positional uncertainties. Consequently, the enlarged ensemble better represents both the intensity and positional errors, thereby optimizing the utilization of observational data. The proposed VTS-EnKF was evaluated against two severe dust storm cases from spring 2021, demonstrating that position errors notably deteriorated forecast performance in terms of root mean square error (RMSE) and normalized mean bias (NMB), impeding the EnKF's effective assimilation. Conversely, the VTS-EnKF improved both the analysis and forecast accuracy compared to the conventional EnKF. Additionally, to provide a more rigorous assessment of its performance, experiments were conducted using fewer ensemble members and different time intervals. ...

A gridded air quality forecast through fusing site-available machine learning predictions from RFSML v1.0 and chemical transport model results from GEOS-Chem v13.1.0 using the ensemble Kalman filter

Journal article (2023) - Li Fang, Jianbing Jin, Arjo Segers, Hong Liao, Ke Li, Bufan Xu, Wei Han, Mijie Pang, Hai Xiang Lin

Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are commonly trained using the historical measurement datasets independently collected at the environmental monitoring stations and their operational forecasts in advance using inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions and cannot solely be used as the operational forecast. In contrast, deterministic chemical transport models (CTMs), which simulate the full life cycles of air pollutants, provide predictions that are continuous in the 3D field. Despite their benefits, CTM predictions are typically biased, particularly on a fine scale, owing to the complex error sources due to the emission, transport, and removal of pollutants. In this study, we proposed a fusion of site-available machine learning prediction, which is from our regional feature selection-based machine learning model (RFSML v1.0), and a CTM prediction. Compared to the normal pure machine learning model, the fusion system provides a gridded prediction with relatively high accuracy. The prediction fusion was conducted using the Bayesian-theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented superior performance compared to the pure CTM. Moreover, covariance inflation further enhanced the fused prediction, particularly in cases of severe underestimation. ...

Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are commonly trained using the historical measurement datasets independently collected at the environmental monitoring stations and their operational forecasts in advance using inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions and cannot solely be used as the operational forecast. In contrast, deterministic chemical transport models (CTMs), which simulate the full life cycles of air pollutants, provide predictions that are continuous in the 3D field. Despite their benefits, CTM predictions are typically biased, particularly on a fine scale, owing to the complex error sources due to the emission, transport, and removal of pollutants. In this study, we proposed a fusion of site-available machine learning prediction, which is from our regional feature selection-based machine learning model (RFSML v1.0), and a CTM prediction. Compared to the normal pure machine learning model, the fusion system provides a gridded prediction with relatively high accuracy. The prediction fusion was conducted using the Bayesian-theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented superior performance compared to the pure CTM. Moreover, covariance inflation further enhanced the fused prediction, particularly in cases of severe underestimation.

Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China

Journal article (2022) - Li Fang, Jianbing Jin, Arjo Segers, Hai Xiang Lin, Mijie Pang, Cong Xiao, Tuo Deng, Hong Liao

With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models). ...

Machine learning based bias correction for numerical chemical transport models

Journal article (2021) - Min Xu, Jianbing Jin, Guoqiang Wang, Arjo Segers, Tuo Deng, Hai Xiang Lin

Air quality warning and forecasting systems are usually based on numerical chemical transport models (CTMs). Those dynamic models perform predictions by simulating the life cycles of the atmospheric components, including emission, transport and removal. However, the accuracy of these CTMs are still limited because of many imperfections, e.g., uncertainties in the input sources such as emission inventories, wind fields, boundary conditions, as well as insufficient knowledge about the atmospheric dynamics themselves. All these will mislead the CTM prediction constantly, or in a systematic way. In this paper, an approach based on machine learning is applied to predict model bias in the CTM. It is then combined with the CTM for formulating a hybrid forecast system. To our knowledge, it is the first time that machine learning methods are used in this way. The hybrid system is tested on the fine particular matter (PM_2.5) prediction in Shanghai, China. The results showed that machine learning can be an effective tool to improve the accuracy of CTM prediction. In case of short term PM_2.5 forecast (forecast length less than 12 h), statistical metrics of the root mean square error, mean absolute error, mean absolute percentage error as well as the air quality rank predicted accuracy all show the forecast skill is remarkably improved; while for long term prediction, improvement is not ensured. ...