EJ

E. Johnsson

info

Please Note

2 records found

Hydroisomerization of alkane isomers is an important step in the manufacture of current kerosene and sustainable aviation fuels. Zeolites are used as acid catalysts in the process. It is therefore important to have predictions of the maximum loading of hydrocarbons in zeolites. Here, a cascade model using machine learning models is used to predict the maximum loading of alkane isomers in zeolites. The cascade is composed of a gradient-boosted tree classifier stage that predicts whether adsorption occurs or not, and a regressor predicting the value of the maximum loading. The final dataset consists of 45 different molecules (both linear and branched alkanes up to C16) and 97 different zeolites structures, resulting in 4365 datapoints. Descriptors include information on the geometry and topology of zeolite channels, as well as shape and size of molecules. Extra composite descriptors are also present to provide the models a physical basis for predictions. Multiple regressors of different nature are considered: Support Vector Regressors, Gradient-Boosted Trees, extreme Gradient-Boosted Trees, and the TabPFN pretrained model. Out of all the models, TabPFN yields the highest generalization performance and lowest error. An interpretability analysis is conducted to assess whether the decisions abide by the governing physics of adposition. It is confirmed that the top descriptor choices abided by the necessary physical constraints, but also that secondary properties such as shape-based selectivity are also accounted for. It is shown that despite both classifier and regressor being insensitive to random splits in data, the regressor is prone to overfitting at low fractions of data withheld for testing. The cascade model is compared with an Artificial Neural Network for training and deployability. Despite training taking more resources for the neural network, the latter is lighter both in memory and storage when compared to the cascade. This work builds on previous research in predicting the Henry coefficient at zero loading. Using this previous model and the findings of this work, one can draw the full adsorption isotherm for any alkane, thus enabling the analysis of adsorption behaviour of alkane mixtures using IAST. ...
Journal article (2026) - Eric Johnsson, Shrinjay Sharma, Arvind Gangoli Rao, David Dubbeldam, Sofia Calero, Thijs J.H. Vlugt
Hydroisomerization of alkane isomers is an important step in the manufacture of current kerosene and sustainable aviation fuels. Zeolites are used as acid catalysts in this process. It is therefore important to have predictions of the adsorption capacity or maximum loading of hydrocarbons in zeolites. Here, a cascade model using machine learning models is used to predict the maximum loading of alkane isomers in zeolites. The cascade is composed of a gradient-boosted tree classifier stage that predicts whether adsorption occurs and a regressor predicting the value of the maximum loading. The final data set consists of 45 different adsorbates (both linear and branched alkanes up to C16) and 97 different zeolite structures, resulting in 4365 data points. Descriptors include information on the geometry and topology of zeolite channels as well as the shape and size of the adsorbates. Extra composite descriptors are also present to provide the physical basis for predictions. Multiple regressors of different natures are considered: support vector regressors, gradient-boosted trees, extreme gradient-boosted trees, and the TabPFN pretrained model. TabPFN yields the highest generalization performance and the lowest error. An interpretability analysis using SHAP reveals that the most influential descriptors are physically meaningful, highlighting steric and volumetric constraints as the primary factors controlling the prediction of qmax. It is shown that despite both the classifier and the regressor being insensitive to random splits in data, the regressor is prone to overfitting at low fractions of data withheld for testing. The cascade model is compared to an Artificial Neural Network for training and resource efficiency. Despite training being longer for the neural network, the final model is lighter in both memory and storage. This work is built on our previous research in predicting the Henry coefficients of long-chain alkanes in zeolites. Using this previous model and the findings of this work, one could construct the adsorption isotherm for any alkane, thus enabling the analysis of adsorption behavior of alkane mixtures using IAST. ...