Type validation of Type4Py using Mypy

Bachelor Thesis (2022)
Author(s)

M.A.P. Mac Gillavry (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.A.M. Mir – Mentor (TU Delft - Software Engineering)

Sebastian Proksch – Mentor (TU Delft - Software Engineering)

JA Pouwelse – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Merlijn Mac Gillavry
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Merlijn Mac Gillavry
Graduation Date
24-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine Learning to predict types for Python code. These predictions can be applied by developers to their python code to increase readability and can later be tested by a type-checker for possible type-errors. If a prediction does not return a type-error that prediction is called type-correct. Type4Py has been evaluated by matching its predictions with earlier annotations, also called ground-truth, and has gotten an MRR of 71.7%. However, Type4Py’s predictions have not been evaluated on their typecorrectness. Therefore, I sought out to answer the following research question: How well does Type4Py perform when validated by the static typechecker Mypy? I answered this research question by answering two sub-questions: How many of Type4Py’s predictions are type-correct? And how many of Type4Py’s predictions are type-correct and match ground-truth? I tested a cleaned subset of the ManyTypes4Py dataset with Mypy by running a greedy strategy where I would always pick Type4Py’s prediction with the highest confidence on three different confidence thresholds: 0.25, 0.5 and 0.75 and reached accuracies in terms of typecorrectness of 88%, 91% and 95% for those, respectively. For the case where Type4Py’s predictions matched ground-truth, the predictions on those same thresholds reached accuracies in terms of type-correctness of 95%, 97% and 98%. Comparing this with a similar Type predictor namely, Typilus . Type4Py’s predictions are more typecorrect with a confidence level of at most 50%.

Files

License info not available