Rethinking and recomputing the value of machine learning models

None, None; None, None; None, None; None, None; None, None

Rethinking and recomputing the value of machine learning models

Journal Article (2025)

Author(s)

Burcu Sayin (Università degli Studi di Trento)

Jie Yang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Xinyue Chen (Student TU Delft)

Andrea Passerini (Università degli Studi di Trento)

Fabio Casati (Università degli Studi di Trento, Servicenow)

Research Group

Web Information Systems

Machine learning Hybrid intelligence Cost-sensitive learning Selective classification

DOI related publication

https://doi.org/10.1007/s10462-025-11242-6 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:383ae36c-b783-4d62-b748-60c1bf2c79b5

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Web Information Systems

Journal title

Artificial Intelligence Review

Issue number

8

Volume number

58

Article number

238

Downloads counter

204

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we argue that the prevailing approach to training and evaluating machine learning models often fails to consider their real-world application within organizational or societal contexts, where they are intended to create beneficial value for people. We propose a shift in perspective, redefining model assessment and selection to emphasize integration into workflows that combine machine predictions with human expertise, particularly in scenarios requiring human intervention for low-confidence predictions. Traditional metrics like accuracy and f-score fail to capture the beneficial value of models in such hybrid settings. To address this, we introduce a simple yet theoretically sound “value” metric that incorporates task-specific costs for correct predictions, errors, and rejections, offering a practical framework for real-world evaluation. Through extensive experiments, we show that existing metrics fail to capture real-world needs, often leading to suboptimal choices in terms of value when used to rank classifiers. Furthermore, we emphasize the critical role of calibration in determining model value, showing that simple, well-calibrated models can often outperform more complex models that are challenging to calibrate.

Files

S10462-025-11242-6.pdf

(pdf | 2.49 Mb)