Using GitHub Copilot for Test Generation in Python

None, None; None, None; None, None

Using GitHub Copilot for Test Generation in Python

An Empirical Study

Conference Paper (2024)

Author(s)

Khalid El Haji (Student TU Delft)

Carolin Brandt (TU Delft - Software Engineering)

Andy Zaidman (TU Delft - Software Technology)

Research Group

Software Engineering

DOI related publication

https://doi.org/10.1145/3644032.3644443

To reference this document use:

https://resolver.tudelft.nl/uuid:b45665fa-00ad-4a7c-8643-873a700b685c

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Software Engineering

Pages (from-to)

45-55

ISBN (print)

979-8-4007-0588-5

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Writing unit tests is a crucial task in software development, but it is also recognized as a time-consuming and tedious task. As such, numerous test generation approaches have been proposed and investigated. However, most of these test generation tools produce tests that are typically difficult to understand. Recently, Large Language Models (LLMs) have shown promising results in generating source code and supporting software engineering tasks. As such, we investigate the usability of tests generated by GitHub Copilot, a proprietary closed-source code generation tool that uses an LLM. We evaluate GitHub Copilot's test generation abilities both within and without an existing test suite, and we study the impact of different code commenting strategies on test generations.Our investigation evaluates the usability of 290 tests generated by GitHub Copilot for 53 sampled tests from open source projects. Our findings highlight that within an existing test suite, approximately 45.28% of the tests generated by Copilot are passing tests; 54.72% of generated tests are failing, broken, or empty tests. Furthermore, if we generate tests using Copilot without an existing test suite in place, we observe that 92.45% of the tests are failing, broken, or empty tests. Additionally, we study how test method comments influence the usability of test generations.

Files

3644032.3644443.pdf

(pdf | 0.914 Mb)

- Embargo expired in 10-12-2024

License info not available