Completing Function Documentation Comments Using Structural Information

None, None; None, None; None, None; None, None

Completing Function Documentation Comments Using Structural Information

Journal Article (2023)

Author(s)

Adelina Ciurumelea (Universitat Zurich)

Carol V. Alexandru (Universitat Zurich)

Harald C. Gall (Universitat Zurich)

Sebastian Proksch (TU Delft - Software Engineering)

Research Group

Software Engineering

DOI related publication

https://doi.org/10.1007/s10664-022-10284-6

Comment completion Javadocs Neural language models Python documentation strings

To reference this document use:

https://resolver.tudelft.nl/uuid:61091ff8-e0b9-4d39-80b3-31c9074d4181

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Software Engineering

Issue number

4

Volume number

28

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Source code comments are a cornerstone of software documentation facilitating feature development and maintenance. Well-defined documentation formats, like Javadoc, make it easy to include structural metadata used to, for example, generate documentation manuals. However, the actual usage of structural elements in source code comments has not been studied yet. We investigate to which extent these structural elements are used in practice and whether the added information can be leveraged to improve tools assisting developers when writing comments. Existing research on comment generation traditionally focuses on automatic generation of summaries. However, recent works have shown promising results when supporting comment authoring through a next-word prediction. In this paper, we present an in-depth analysis of commenting practice in more than 18K open-source projects written in Python and Java showing that many structural elements, particularly parameter and return value descriptions are indeed widely used. We discover that while a majority are rather short at about 6 to 9 words, many are several hundred words in length. We further find that Python comments tend to be significantly longer than Java comments, possibly due to the weakly-typed nature of the former. Following the empirical analysis, we extend an existing language model with support for structural information, substantially improving the Top-1 accuracy of predicted words (Python 9.6%, Java 7.8%).

Files

S10664_022_10284_6.pdf

(pdf | 5.34 Mb)