M. Pang | TU Delft Repository

2014—2023 年东亚地区沙尘气溶胶质量浓度再分析数据集

Journal article (2026) - Jianbing Jin, Dehao Li, Mijie Pang, Zheqi Cheng, Canjie Xu, Hong Liao

Dust storms are among the most severe hazardous weather phenomena affecting northern China and adjacent regions.The primary dust source areas—including the Alxa-Hexi Corridor,the Tengger Desert,and the southern Mongolian Gobi Desert—emit more than 800 Mt of dust annually.During spring,the interaction between the Siberian high and Mongolian cyclones generates strong near-surface winds and enhanced vertical convection, forming a three-dimensional “uplift-suspension-transport”structure that promotes dust storm development.Under ongoing global warming,declining spring precipitation over the Mongolian Plateau and extensive desertification—currently affecting over 75% of Mongolia—are expected to further intensify transboundary dust transport into China,with severe consequences for public health,agriculture,and transportation.These challenges underscore the urgent need for long-term,high-quality dust datasets to improve understanding of dust emission mechanisms and forecasting capabilities. Atmospheric models are essential tools for simulating dust emission,transport,and deposition,as well as for assessing impacts on climate,ecosystems,and human health.However,large uncertainties in emission parameter-izations and long-range transport processes persist,often resulting in substantial biases in simulated dust concentrations,in some cases differing from observations by up to two orders of magnitude.Recent advances in atmospheric observation systems provide valuable constraints,including China's nationwide hourly PM₁₀ monitoring network and satellite remote sensing products with broad spatial coverage and multi-dimensional aerosol information,such as MODIS aerosol optical depth (AOD).In this context,data assimilation methods grounded in Bayesian theory offer an effective framework for integrating observational data with model simulations to generate spatially continuous and more accurate dust reanalysis datasets.Despite progress,existing studies have primarily focused on individual dust events,and long-term dust reanalysis efforts remain limited due to observation biases,sparse data coverage over source regions,transport errors,and the strong spatiotemporal variability of dust emissions. Building upon a self-developed dust storm assimilation system,this study integrates ground-based PM₁₀ observations,bias-corrected satellite AOD data,and an effective valid time shift ensemble Kalman filter (VTS-EnKF)designed to jointly correct dust intensity and transport position errors.Using this framework,we construct a high-resolution (0. 25°×0. 25°,3-hourly)three-dimensional dust aerosol mass concentration reanalysis dataset for East Asia during spring (March-May)over the period 2014—2023.This dataset provides a robust basis for investigating long-term dust variability,transboundary transport processes,and associated impacts on climate,the environment,and public health. Comparisons with MERRA-2 dust reanalysis demonstrate clear advantages of the newly developed dataset. While MERRA-2 exhibits reasonable agreement at low dust concentrations (<75 μg·m^-3),it substantially underestimates dust levels and exhibits larger uncertainties under moderate to severe dust conditions,particularly in dust-affected regions.Analysis of springtime dust variability from 2014 to 2023 reveals pronounced interannual and spatial heterogeneity,with dominant dust activity over the Tarim Basin and the Gobi Desert in China and episodic contributions from the Mongolian Gobi.Relative to observations,prior simulations tend to overestimate dust concentrations,whereas data assimilation introduces widespread negative analysis increments,reducing the regional mean concentration from 65. 24 to 39. 99 μg·m^-3.Notably,the reanalysis accurately captures both the intensity and timing of dust events in densely populated areas.Overall,the assimilation framework substantially improves dust representation,reducing RMSE by 76.9% and yielding a more reliable depiction of monthly and interannual dust variability. ...

Dust storms are among the most severe hazardous weather phenomena affecting northern China and adjacent regions.The primary dust source areas—including the Alxa-Hexi Corridor,the Tengger Desert,and the southern Mongolian Gobi Desert—emit more than 800 Mt of dust annually.During spring,the interaction between the Siberian high and Mongolian cyclones generates strong near-surface winds and enhanced vertical convection, forming a three-dimensional “uplift-suspension-transport”structure that promotes dust storm development.Under ongoing global warming,declining spring precipitation over the Mongolian Plateau and extensive desertification—currently affecting over 75% of Mongolia—are expected to further intensify transboundary dust transport into China,with severe consequences for public health,agriculture,and transportation.These challenges underscore the urgent need for long-term,high-quality dust datasets to improve understanding of dust emission mechanisms and forecasting capabilities. Atmospheric models are essential tools for simulating dust emission,transport,and deposition,as well as for assessing impacts on climate,ecosystems,and human health.However,large uncertainties in emission parameter-izations and long-range transport processes persist,often resulting in substantial biases in simulated dust concentrations,in some cases differing from observations by up to two orders of magnitude.Recent advances in atmospheric observation systems provide valuable constraints,including China's nationwide hourly PM₁₀ monitoring network and satellite remote sensing products with broad spatial coverage and multi-dimensional aerosol information,such as MODIS aerosol optical depth (AOD).In this context,data assimilation methods grounded in Bayesian theory offer an effective framework for integrating observational data with model simulations to generate spatially continuous and more accurate dust reanalysis datasets.Despite progress,existing studies have primarily focused on individual dust events,and long-term dust reanalysis efforts remain limited due to observation biases,sparse data coverage over source regions,transport errors,and the strong spatiotemporal variability of dust emissions. Building upon a self-developed dust storm assimilation system,this study integrates ground-based PM₁₀ observations,bias-corrected satellite AOD data,and an effective valid time shift ensemble Kalman filter (VTS-EnKF)designed to jointly correct dust intensity and transport position errors.Using this framework,we construct a high-resolution (0. 25°×0. 25°,3-hourly)three-dimensional dust aerosol mass concentration reanalysis dataset for East Asia during spring (March-May)over the period 2014—2023.This dataset provides a robust basis for investigating long-term dust variability,transboundary transport processes,and associated impacts on climate,the environment,and public health. Comparisons with MERRA-2 dust reanalysis demonstrate clear advantages of the newly developed dataset. While MERRA-2 exhibits reasonable agreement at low dust concentrations (<75 μg·m^-3),it substantially underestimates dust levels and exhibits larger uncertainties under moderate to severe dust conditions,particularly in dust-affected regions.Analysis of springtime dust variability from 2014 to 2023 reveals pronounced interannual and spatial heterogeneity,with dominant dust activity over the Tarim Basin and the Gobi Desert in China and episodic contributions from the Mongolian Gobi.Relative to observations,prior simulations tend to overestimate dust concentrations,whereas data assimilation introduces widespread negative analysis increments,reducing the regional mean concentration from 65. 24 to 39. 99 μg·m^-3.Notably,the reanalysis accurately captures both the intensity and timing of dust events in densely populated areas.Overall,the assimilation framework substantially improves dust representation,reducing RMSE by 76.9% and yielding a more reliable depiction of monthly and interannual dust variability.

Nationwide Overestimation of Black Carbon Emissions During Clean Air Action Identified by Assimilation Inversion

Journal article (2026) - Li Fang, Jianbing Jin, Jiandong Wang, Kang Hu, Nan Li, Mijie Pang, Hong Liao

An accurate estimate of black carbon (BC) emission is critical, as BC represents one of the most important short-lived climate forcers. The widely used BC emission inventories were developed using either bottom-up or top-down approaches, both of which have large uncertainties. The challenges of the bottom-up approach include uncertainties in emission factors for different fuel types and combustion technologies. Conversely, top-down BC emission inversion relies primarily on satellite-retrieved aerosol absorption optical depth, which has significant limitations in quantifying BC-specific contributions. The China Atmospheric Monitoring Network, established by the China Meteorological Administration, provides ground-based hourly BC observations and a valuable opportunity to constrain BC emissions. This study presents the first application of these nationwide BC observations in emission inversion during the Clean Air Action (2013–2017), achieved using the 4DEnVar assimilation technique. Validation against independent observations demonstrates significant improvements in posterior estimates, reducing the root mean square error by 36.7%. Compared to the posterior, widely used bottom-up inventories (e.g., MEIC) overestimate China's total BC emissions by 36.7%, with overestimations ranging up to 80.6% in the North China Plain (averaged between 2013 and 2017). In terms of climate impact, MEIC-based estimates yield an 18.7% higher direct radiative effect on average, while CMIP6 historical estimates further exaggerate BC-induced forcing by a factor of 1.7. Additionally, our inversion reveals that annual total BC emissions declined markedly by 28.1% during the Clean Air Action, from 1.24 to 0.89 Tg. These findings are critical for quantifying the role of BC in the regional and global climate. ...

Zeeman: A Deep Learning Framework for Regional Atmospheric Chemistry Forecasting

Journal article (2026) - Mijie Pang, Jianbing Jin, Arjo Segers, Hai Xiang Lin, Guoqiang Wang, Hong Liao, Wei Han

Abstract Atmospheric chemistry encapsulates the emission of various pollutants, the complex chemistry reactions, and the meteorology dominant transport, which form a dynamic system that governs air quality. While deep learning (DL) models have shown promise in capturing intricate patterns for forecasting individual atmospheric components—such as PM2.5 and ozone—the critical interactions among multiple pollutants and the combined influence of emissions and meteorology are often overlook. This study introduces a DL-based framework–Zeeman for atmospheric chemistry forecasting. Our model effectively captures the nuanced relationships among these constituents and while achieving a 68.5-fold increase in computational speed compared to traditional numerical model. Evaluations demonstrate that our approach rivals numerical model, offering an efficient solution for atmospheric chemistry forecasting. In the future, this model could be further integrated with data assimilation techniques to facilitate efficient and accurate atmospheric emission estimation and concentration forecast. ...

A Transformer-based agent model of GEOS-Chem v14.2.2 for informative prediction of PM2.5 and O3 levels to future emission scenarios: TGEOS v1.0

Journal article (2026) - Dehao Li, Jianbing Jin, Guoqiang Wang, Mijie Pang, Weihong Zhang, Hong Liao

Efficient and informative air quality modeling in future emission scenarios is vital for effective formulation of emission reduction policies. Traditional chemical transport models (CTMs) struggle with the computational demands required for timely predictions. While advanced emulator techniques greatly accelerate CTM simulating process, they fall short in providing comprehensive estimates of future air quality due to their limited model structure. Additionally, these emulators often have difficulty simultaneously accounting for varying emission variables and the effects of regional transport, which limits their applicability and undermines prediction accuracy. In this study, an informative future air quality prediction model “TGEOS v1.0” based on the Transformer framework is developed as an efficient agent model of GEOS-Chem v14.2.2. TGEOS is able to efficiently estimate key statistical indicators of PM2.5 and O3 concentrations under future emission scenarios and capture potential extreme pollution events, with approximately 2.51 s to execute one-year estimation. The model incorporates sectoral emissions of up to 26 distinct species as well as the impacts of regional emissions and meteorology on pollutant concentrations, enhancing its versatility and predictive accuracy. The spatial and probability distributions predicted by TGEOS are in good agreement with GEOS-Chem, with the correlation coefficients for PM2.5 and O3 exceed 0.98 in high-pollution months. Compared with other machine learning models, TGEOS based on Transformer framework showcases superior performance, underscoring the potential of the Transformer framework in air quality modeling. ...

The sensitivity of aerosol data assimilation to vertical profiles

Case study of dust storm assimilation with LOTOS-EUROS v2.2

Journal article (2025) - Mijie Pang, Jianbing Jin, Wei Han, Ting Yang, Xi Chen, Arjo Segers, Batjargal Buyantogtokh, Yixuan Gu, Jiandong Li, Hai Xiang Lin, Hong Liao

Modelling and observational techniques are pivotal in aerosol research, yet each approach exhibits inherent limitations. Aerosol observation is constrained by its limited spatial and temporal coverage compared to that of models. On the other hand, models tend to possess higher uncertainties and biases compared to observations. Aerosol data assimilation has gained popularity as it combines the advantages of both methods. Despite numerous studies in this domain, few have addressed the challenges faced in assimilating aerosol data with significant differences in magnitude and degree of freedom between the model state and observations, especially in the vertical direction. These challenges can lead to the preservation - or even the exacerbation - of structural inaccuracies within the assimilation process. This study investigates the sensitivity of dust aerosol data assimilation to the vertical structure of the aerosol profile. We assimilate a variety of dust observations, encompassing ground-based particulate matter (PM10) measurements, and satellite-derived dust optical depth (DOD) data, using the ensemble Kalman filter (EnKF). The assimilation process is elucidated, detailing the assimilation of raw ground-based and satellite-based observations for an optimized three-dimensional (3D) posterior state. To demonstrate the impact of accurate vs. erroneous prior aerosol vertical profiles on the assimilation result, we select three cases of super dust storms for analysis. Our findings reveal that the assimilation of ground observations would optimize the dust field at the ground in general. However, the vertical structure presents a more complex challenge. When the prior profile accurately reflects the true vertical structure, the assimilation process can successfully preserve this structure. Conversely, if the prior profile introduces an incorrect structure, the assimilation can significantly deteriorate the integrity of the aerosol profile. This is also found in the assimilation of DOD, which exhibits a comparable pattern in its sensitivity to the initial aerosol profile's accuracy. ...

Modelling and observational techniques are pivotal in aerosol research, yet each approach exhibits inherent limitations. Aerosol observation is constrained by its limited spatial and temporal coverage compared to that of models. On the other hand, models tend to possess higher uncertainties and biases compared to observations. Aerosol data assimilation has gained popularity as it combines the advantages of both methods. Despite numerous studies in this domain, few have addressed the challenges faced in assimilating aerosol data with significant differences in magnitude and degree of freedom between the model state and observations, especially in the vertical direction. These challenges can lead to the preservation - or even the exacerbation - of structural inaccuracies within the assimilation process. This study investigates the sensitivity of dust aerosol data assimilation to the vertical structure of the aerosol profile. We assimilate a variety of dust observations, encompassing ground-based particulate matter (PM10) measurements, and satellite-derived dust optical depth (DOD) data, using the ensemble Kalman filter (EnKF). The assimilation process is elucidated, detailing the assimilation of raw ground-based and satellite-based observations for an optimized three-dimensional (3D) posterior state. To demonstrate the impact of accurate vs. erroneous prior aerosol vertical profiles on the assimilation result, we select three cases of super dust storms for analysis. Our findings reveal that the assimilation of ground observations would optimize the dust field at the ground in general. However, the vertical structure presents a more complex challenge. When the prior profile accurately reflects the true vertical structure, the assimilation process can successfully preserve this structure. Conversely, if the prior profile introduces an incorrect structure, the assimilation can significantly deteriorate the integrity of the aerosol profile. This is also found in the assimilation of DOD, which exhibits a comparable pattern in its sensitivity to the initial aerosol profile's accuracy.

Valid time shifting ensemble Kalman filter (VTS-EnKF) for dust storm forecasting

Journal article (2024) - M. Pang, Jianbing Jin, Arjo Segers, Huiya Jiang, Wei Han, Batjargal Buyantogtokh, Ji Xia, Li Fang, Hai Xiang Lin, More authors...

Dust storms pose significant risks to health and property, necessitating accurate forecasting for preventive measures. Despite advancements, dust models grapple with uncertainties arising from emission and transport processes. Data assimilation addresses these by integrating observations to rectify model error, enhancing forecast precision. The ensemble Kalman filter (EnKF) is a widely used assimilation algorithm that effectively optimize model states, particularly in terms of intensity adjustment. However, the EnKF's efficacy is challenged by position errors between modeled and observed dust features, especially under substantial position errors. This study introduces the valid time shifting ensemble Kalman filter (VTS-EnKF), which combines stochastic EnKF with a valid time shifting mechanism. By recruiting additional ensemble members from neighboring valid times, this method not only accommodates variations in dust load but also explicitly accounts for positional uncertainties. Consequently, the enlarged ensemble better represents both the intensity and positional errors, thereby optimizing the utilization of observational data. The proposed VTS-EnKF was evaluated against two severe dust storm cases from spring 2021, demonstrating that position errors notably deteriorated forecast performance in terms of root mean square error (RMSE) and normalized mean bias (NMB), impeding the EnKF's effective assimilation. Conversely, the VTS-EnKF improved both the analysis and forecast accuracy compared to the conventional EnKF. Additionally, to provide a more rigorous assessment of its performance, experiments were conducted using fewer ensemble members and different time intervals. ...

Dust storm forecasting through coupling LOTOS-EUROS with localized ensemble Kalman filter

Journal article (2023) - Mijie Pang, Jianbing Jin, Arjo Segers, Huiya Jiang, Li Fang, Hai Xiang Lin, Hong Liao

Super dust storms re-occurred over East Asia in 2021 spring and casted great health damages and property losses. It is essential to achieve an accurate dust forecast to reduce the damage for early warning. The forecasting system fundamentally relies on a numerical model which can forecast the full evolution of dust storms. However, large uncertainties exist in model forecasts. Meanwhile, various near-real-time observations are available that contain valuable dust information. A dust storm forecasting system is here developed through coupling a chemical transport model, LOTOS-EUROS, and Localized EnKF (LEnKF) assimilation approach. The assimilations are carried out via an interface of our self-designed assimilation toolbox, PyFilter v1.0. Ground-based PM₁₀ measurements from air quality monitoring network are assimilated. Sequential assimilation tests are carried out over the 2021 spring super dust storms. The results show that the assimilation-based forecasting system produces a promising dust forecast than model-only forecast, and the improvements is also validated through comparing to the independent MODIS aerosol optical depth (AOD). Superior performance is obtained when LEnKF is implemented, as the localization helps EnKF in resolving the PM₁₀ measurements that have a large spatial variability with limited ensemble members. In addition, sensitivity experiments are conducted to exploit the distance-dependent localization for the LEnKF. Considering both cases, the optimal choice of the distance is tested to be around 500 km: the larger distance is less effective in removing the spurious correction, while the smaller one easily falls into the local optimum and the model would become divergent rapidly. ...

A gridded air quality forecast through fusing site-available machine learning predictions from RFSML v1.0 and chemical transport model results from GEOS-Chem v13.1.0 using the ensemble Kalman filter

Journal article (2023) - Li Fang, Jianbing Jin, Arjo Segers, Hong Liao, Ke Li, Bufan Xu, Wei Han, Mijie Pang, Hai Xiang Lin

Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are commonly trained using the historical measurement datasets independently collected at the environmental monitoring stations and their operational forecasts in advance using inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions and cannot solely be used as the operational forecast. In contrast, deterministic chemical transport models (CTMs), which simulate the full life cycles of air pollutants, provide predictions that are continuous in the 3D field. Despite their benefits, CTM predictions are typically biased, particularly on a fine scale, owing to the complex error sources due to the emission, transport, and removal of pollutants. In this study, we proposed a fusion of site-available machine learning prediction, which is from our regional feature selection-based machine learning model (RFSML v1.0), and a CTM prediction. Compared to the normal pure machine learning model, the fusion system provides a gridded prediction with relatively high accuracy. The prediction fusion was conducted using the Bayesian-theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented superior performance compared to the pure CTM. Moreover, covariance inflation further enhanced the fused prediction, particularly in cases of severe underestimation. ...

Statistical methods, particularly machine learning models, have gained significant popularity in air quality predictions. These prediction models are commonly trained using the historical measurement datasets independently collected at the environmental monitoring stations and their operational forecasts in advance using inputs of the real-time ambient pollutant observations. Therefore, these high-quality machine learning models only provide site-available predictions and cannot solely be used as the operational forecast. In contrast, deterministic chemical transport models (CTMs), which simulate the full life cycles of air pollutants, provide predictions that are continuous in the 3D field. Despite their benefits, CTM predictions are typically biased, particularly on a fine scale, owing to the complex error sources due to the emission, transport, and removal of pollutants. In this study, we proposed a fusion of site-available machine learning prediction, which is from our regional feature selection-based machine learning model (RFSML v1.0), and a CTM prediction. Compared to the normal pure machine learning model, the fusion system provides a gridded prediction with relatively high accuracy. The prediction fusion was conducted using the Bayesian-theory-based ensemble Kalman filter (EnKF). Background error covariance was an essential part in the assimilation process. Ensemble CTM predictions driven by the perturbed emission inventories were initially used for representing their spatial covariance statistics, which could resolve the main part of the CTM error. In addition, a covariance inflation algorithm was designed to amplify the ensemble perturbations to account for other model errors next to the uncertainty in emission inputs. Model evaluation tests were conducted based on independent measurements. Our EnKF-based prediction fusion presented superior performance compared to the pure CTM. Moreover, covariance inflation further enhanced the fused prediction, particularly in cases of severe underestimation.

4DEnVar-based inversion system for ammonia emission estimation in China through assimilating IASI ammonia retrievals

Journal article (2023) - Jianbing Jin, Li Fang, Baojie Li, Hong Liao, Ye Wang, Wei Han, Ke Li, Mijie Pang, Xingyi Wu, Hai Xiang Lin

Atmospheric ammonia has been hazardous to the environment and human health for decades. Current inventories are usually constructed in a bottom-up manner and subject to uncertainties and incapable of reproducing the spatiotemporal characteristics of ammonia emission. Satellite measurements, for example, Infrared Atmospheric Sounder Interferometer (IASI) and Cross-Track Infrared Sounder, which provide global coverage of ammonia distribution, have gained popularity in ammonia emission estimation through data assimilation methods. However, satellite-based emission inversion studies on China are limited. In this study, we propose a four-dimensional ensemble variational-based ammonia emission inversion system to optimize ammonia emissions in China. It was developed by assimilating the IASI ammonia retrievals onboard Meteorological Operational satellite A and B into a chemical transport model Goddard Earth Observing System Chemical model (GEOS-Chem). Monthly inversion experiments were conducted in April, July, and October 2016 to test the performance. The inversion result indicated that the prior inventory from the MEIC model captured ammonia spreads in general; however, it heterogeneously underrated the emission intensity. The increments obtained in the assimilation were as high as 50% in North, East, and Northwest China. The posterior emission inventory presented a regional emission flux consistent with relevant studies. Driven by the optimized source estimate, GEOS-Chem provides superior results than using the prior in the evaluation of the assimilated IASI retrievals and the surface ammonia concentration measured by the ground-based Ammonia Monitoring Network in China. ...

Inverse modeling of the 2021 spring super dust storms in East Asia

Journal article (2022) - Jianbing Jin, Mijie Pang, Arjo Segers, Wei Han, Li Fang, Baojie Li, Haochuan Feng, Hai Xiang Lin, Hong Liao

Last spring, super dust storms reappeared in East Asia after being absent for one and a half decades. The event caused enormous losses in both Mongolia and China. Accurate simulation of such super sandstorms is valuable for the quantification of health damage, aviation risks, and profound impacts on the Earth system, but also to reveal the climatic driving force and the process of desertification. However, accurate simulation of dust life cycles is challenging, mainly due to imperfect knowledge of emissions. In this study, the emissions that lead to the 2021 spring dust storms are estimated through assimilation of MODIS AOD and ground-based PM10 concentration data simultaneously. With this, the dust concentrations during these super storms could be reproduced and validated with concentration observations. The multi-observation assimilation is also compared against emission inversion that assimilates AOD or PM10 concentration measurements alone, and the added values are analyzed. The emission inversion results reveal that wind-blown dust emissions originated from both China and Mongolia during spring 2021. Specifically, 19.9×106 and 37.5×106ĝ€¯t of particles were released in the Chinese and Mongolian Gobi, respectively, during these severe dust events. By source apportionment it was revealed that the Mongolian Gobi poses more severe threats to the densely populated regions of the Fenwei Plain (FWP) and the North China Plain (NCP) located in northern China than does the Chinese Gobi. It was estimated that 63ĝ€¯% of the dust deposited in FWP was due to transnational transport from Mongolia. For NCP, the long-distance transport dust from Mongolia contributes about 69ĝ€¯% to the dust deposition. ...

Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China

Journal article (2022) - Li Fang, Jianbing Jin, Arjo Segers, Hai Xiang Lin, Mijie Pang, Cong Xiao, Tuo Deng, Hong Liao

With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models). ...