Learning Off-By-One Mistakes: An Empirical Study on Different Deep Learning Models

More Info
expand_more

Abstract

Mistakes in binary conditions are a source of error in many software systems. They happen when developers use < or > instead of <= or >=. These boundary mistakes are hard to find for developers and pose a manual labor-intensive work. While researches have been proposing solutions to identify errors in boundary conditions, the problem remains a challenge. In this thesis, we propose deep learning models to learn mistakes in boundary conditions and train our model on approximately 1.6M examples with faults in different boundary conditions. We achieve an accuracy of 85.06%, a precision of 85.23% and a recall of 84.82% on a controlled dataset. Additionally, we perform tests on 41 real-world boundary condition bugs found from GitHub and try to find bugs from the Java project of Adyen. However, the false-positive rate of the model remains an issue. We hope that this work paves the way for future developments in using deep learning models for defect prediction.