Model accuracy and data heterogeneity shape uncertainty quantification in machine learning interatomic potentials

None, None; None, None; None, None; None, None; None, None

Model accuracy and data heterogeneity shape uncertainty quantification in machine learning interatomic potentials

Journal Article (2026)

Author(s)

F.S. Shuang (TU Delft - Team Poulumi Dey)

Z. Wei (TU Delft - Team Poulumi Dey)

K. Liu (TU Delft - Team Marcel Sluiter)

Wei Gao (Texas A&M University)

P. Dey (TU Delft - Team Poulumi Dey)

Research Group

Team Poulumi Dey

DOI related publication

https://doi.org/10.1088/2632-2153/ae3d80

Uncertainty quantification Ensemble learning Machine learning interatomic potentials D-optimality Atomic cluster expansion

To reference this document use:

https://resolver.tudelft.nl/uuid:e92da027-b089-437c-8d68-f2e2e267bd8c

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Team Poulumi Dey

Issue number

2

Volume number

7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning interatomic potentials (MLIPs) enable accurate atomistic modeling, but reliable uncertainty quantification (UQ) remains elusive. In this study, we investigate two UQ strategies, ensemble learning and D-optimality, within the atomic cluster expansion framework. It is revealed that higher model accuracy strengthens the correlation between predicted uncertainties and actual errors and improves novelty detection, with D-optimality yielding more conservative estimates. Both methods deliver well calibrated uncertainties on homogeneous training sets, yet they underpredict errors and exhibit reduced novelty sensitivity on heterogeneous datasets. To address this limitation, we introduce clustering enhanced local D-optimality, which partitions configuration space into clusters during training and applies D-optimality within each cluster. This approach substantially improves the detection of novel atomic environments in heterogeneous datasets. Our findings clarify the roles of model fidelity and data heterogeneity in UQ performance and provide a practical route to robust active learning and adaptive sampling strategies for MLIP development.

Files

Shuang_2026_Mach._Learn._Sci._... (pdf)

(pdf | 2.52 Mb)