A search-based training algorithm for cost-aware defect prediction

None, None; None, None; None, None; None, None; None, None

A search-based training algorithm for cost-aware defect prediction

Conference Paper (2016)

Author(s)

Annibale Panichella (TU Delft - Software Engineering)

Carol V. Alexandru (Universitat Zurich)

Sebastiano Panichella (Universitat Zurich)

A. Bacchelli (TU Delft - Software Engineering)

Harald C. Gall (Universitat Zurich)

Research Group

Software Engineering

Copyright

DOI related publication

https://doi.org/10.1145/2908812.2908938

Machine learning Genetic algorithm Defect prediction

To reference this document use:

https://resolver.tudelft.nl/uuid:67912115-0188-47d9-9923-67b794b83f8a

More Info

expand_more

Publication Year

2016

Language

English

Copyright

Research Group

Software Engineering

Pages (from-to)

1077-1084

ISBN (electronic)

978-1-4503-4206-3

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Research has yielded approaches to predict future defects in software artifacts based on historical information, thus assisting companies in effectively allocating limited development resources and developers in reviewing each others' code changes. Developers are unlikely to devote the same effort to inspect each software artifact predicted to contain defects, since the effort varies with the artifacts' size (cost) and the number of defects it exhibits (effectiveness). We propose to use Genetic Algorithms (GAs) for training prediction models to maximize their cost-effectiveness. We evaluate the approach on two well-known models, Regression Tree and Generalized Linear Model, and predict defects between multiple releases of six open source projects. Our results show that regression models trained by GAs significantly outperform their traditional counterparts, improving the cost-effectiveness by up to 240%. Often the top 10% of predicted lines of code contain up to twice as many defects.

Files

Gecco2016.pdf

(pdf | 0.46 Mb)

License info not available