BLEU it All Away!

Refocussing SE ML on the Homo Sapience

Abstract (2022)
Author(s)

L.H. Applis (TU Delft - Software Engineering)

Research Group
Software Engineering
More Info
expand_more
Publication Year
2022
Language
English
Research Group
Software Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Many tasks in machine learning for software engineering
rely on prominent NLP metrics, such as the BLEU or
ROUGE score. The metrics are under heavy criticism themselves
within the NLP community, but the SE community adapted them
for lack of better alternatives. Within this paper, we summarize
some of the problems with common metrics at the examples of
code and look for alternatives. We argue that our only hope is
the worst of all possible options: Humans.