Testing the Performance of Automated Documentation Generation with Included Inline Comments

Bachelor Thesis (2022)
Author(s)

B. Morkūnas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Panichella – Mentor (TU Delft - Software Engineering)

L.H. Applis – Mentor (TU Delft - Software Engineering)

BHM Gerritsen – Graduation committee member (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Balys Morkūnas
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Balys Morkūnas
Graduation Date
24-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A number of Machine Learning models utilize source code as training data for automating software development tasks. A common trend is to omit inline comments from source code in order to unify and standardize the examples, even though the additional information can capture important aspects and better explain algorithms. We claim that models, utilizing the supplementary data, are able to produce more fluent translations for Automatic Documentation Generation task. We test this by creating two datasets and measuring the performance difference. The results show that there is a slight improvement in translation accuracy when a dataset contains inline comments, with stop words removed. Further research needs to be done to optimize the preprocessing of data and to more accurately detect the scope of inline comments.

Files

BM_RP_final.pdf
(pdf | 0.228 Mb)
License info not available