BLEU it All Away!
Refocussing SE ML on the Homo Sapience
L.H. Applis (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Many tasks in machine learning for software engineering
rely on prominent NLP metrics, such as the BLEU or
ROUGE score. The metrics are under heavy criticism themselves
within the NLP community, but the SE community adapted them
for lack of better alternatives. Within this paper, we summarize
some of the problems with common metrics at the examples of
code and look for alternatives. We argue that our only hope is
the worst of all possible options: Humans.