Type4Py

Practical Deep Similarity Learning-Based Type Inference for Python

Conference Paper (2022)
Authors

S.A.M. Mir (TU Delft - Software Engineering)

Evaldas Latoskinas (Student TU Delft)

Sebastian Proksch (TU Delft - Software Engineering)

Georgios Gousios (TU Delft - Software Engineering, TU Delft - Software Technology)

Research Group
Software Engineering
Copyright
© 2022 S.A.M. Mir, Evaldas Latoskinas, S. Proksch, G. Gousios
To reference this document use:
https://doi.org/10.1145/3510003.3510124
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 S.A.M. Mir, Evaldas Latoskinas, S. Proksch, G. Gousios
Research Group
Software Engineering
Pages (from-to)
2241-2252
ISBN (electronic)
978-1-4503-9221-1
DOI:
https://doi.org/10.1145/3510003.3510124
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and productivity. Lack of static typing can cause run-time exceptions and is a major factor for weak IDE support. To alleviate these issues, PEP 484 introduced optional type annotations for Python. As retrofitting types to existing code-bases is error-prone and laborious, machine learning (ML)-based approaches have been proposed to enable automatic type infer-ence based on existing, partially annotated codebases. However, previous ML-based approaches are trained and evaluated on human-provided type annotations, which might not always be sound, and hence this may limit the practicality for real-world usage. In this paper, we present TYPE4Py, a deep similarity learning-based hier-archical neural network model. It learns to discriminate between similar and dissimilar types in a high-dimensional space, which results in clusters of types. Likely types for arguments, variables, and return values can then be inferred through the nearest neigh-bor search. Unlike previous work, we trained and evaluated our model on a type-checked dataset and used mean reciprocal rank (MRR) to reflect the performance perceived by users. The obtained results show that TYPE4Py achieves an MRR of 77.1 %, which is a substantial improvement of 8.1% and 16.7% over the state-of-the-art approaches Typilus and Typewriter, respectively. Finally, to aid developers with retrofitting types, we released a Visual Stu-dio Code extension, which uses TYPE4Py to provide ML-based type auto-completion for Python.

Files

3510003.3510124.pdf
(pdf | 0.654 Mb)
License info not available