Type validation of Type4Py using Mypy

More Info
expand_more

Abstract

Researchers at the Delft University of Technology have developed Type4Py: a tool that uses Machine Learning to predict types for Python code. These predictions can be applied by developers to their python code to increase readability and can later be tested by a type-checker for possible type-errors. If a prediction does not return a type-error that prediction is called type-correct. Type4Py has been evaluated by matching its predictions with earlier annotations, also called ground-truth, and has gotten an MRR of 71.7%. However, Type4Py’s predictions have not been evaluated on their typecorrectness. Therefore, I sought out to answer the following research question: How well does Type4Py perform when validated by the static typechecker Mypy? I answered this research question by answering two sub-questions: How many of Type4Py’s predictions are type-correct? And how many of Type4Py’s predictions are type-correct and match ground-truth? I tested a cleaned subset of the ManyTypes4Py dataset with Mypy by running a greedy strategy where I would always pick Type4Py’s prediction with the highest confidence on three different confidence thresholds: 0.25, 0.5 and 0.75 and reached accuracies in terms of typecorrectness of 88%, 91% and 95% for those, respectively. For the case where Type4Py’s predictions matched ground-truth, the predictions on those same thresholds reached accuracies in terms of type-correctness of 95%, 97% and 98%. Comparing this with a similar Type predictor namely, Typilus . Type4Py’s predictions are more typecorrect with a confidence level of at most 50%.