On the use of machine learning for failure prediction after collective changes in automated continuous integration testing
Ömer Özdemir (Vestel Elektronik Sanayi ve Ticaret A.Ş)
Reyhan Aydoğan (TU Delft - Interactive Intelligence, Özyeğin University)
Hasan Sözer (Özyeğin University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Continuous Integration (CI) is a development practice where developers regularly merge their code changes into a central repository, enabling simultaneous collaboration across a shared codebase. This frequent integration and automated building process in CI helps to detect and resolve conflicts or errors early in development. However, in large-scale systems, the build process can be costly. Each build incurs expenses, while skipping builds can increase the risk of undetected failures. Accurate predictions can help to identify builds that can be safely skipped to reduce CI costs. This paper presents an empirical study within an industrial setting, investigating the use of machine learning techniques to predict build failures after a set of collective changes. Unlike many existing works that apply random data splitting, our results show that chronological (time-based) splitting offers a more realistic and reliable assessment of model performance in CI environments. We evaluate various models and feature combinations on a dataset derived from real-world industrial projects. We observe high precision but low recall in predicting failed builds, allowing hundreds of successful builds to be correctly skipped, with around a dozen failures potentially being missed. Our analysis shows that this yields substantial time savings of approximately 2.5 h per build on average, while missed failures necessarily result in delayed failure detection, whose practical impact depends on application criticality and operational context.
Files
File under embargo until 03-08-2026