PyHintSearch: Validated Combinatorial Search for Probabilistic Type Inference Models

None, None

PyHintSearch: Validated Combinatorial Search for Probabilistic Type Inference Models

Master Thesis (2024)

Author(s)

M.J. Bekooy (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Georgios Gousios – Mentor (TU Delft - Software Technology)

C.B. Bach – Graduation committee member (TU Delft - Programming Languages)

S. Proksch – Graduation committee member (TU Delft - Software Engineering)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Python Type annotations Static analysis Validated combinatorial search Probabilistic type inference Call graphs Machine learning type inference

To reference this document use:

https://resolver.tudelft.nl/uuid:49649809-5cd3-49a2-89d0-943708211660

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

05-03-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Type annotations in Python are an integral part of static analysis. They can be used for code documentation, error detection and the development of cleaner architectures. By enhancing code quality, they contribute to the robustness, maintainability and comprehensibility of codebases. Tools like static type checkers use type annotations to detect bugs early, with some type checkers like Pyright being capable of inferring annotations statically from source code.

This thesis uses an innovative approach to further enhance type annotation coverage in Python codebases by using a combination of machine learning predictions and combinatorial search. To do this, PyHintSearch was developed. PyHintSearch constructs a search tree to which a depth-first search is applied to systematically explore potential combinations of predicted type annotations and validate them using feedback from the Pyright static type checker. Ultimately, the goal is to identify a branch containing a valid combination of type annotations. These annotations can then be integrated into Python code, thereby enhancing the type annotation coverage, which leads to improved static analysis and ultimately better code quality.

PyHintSearch's effectiveness is evaluated based on type annotation coverage and correctness, performance, and practical usability. Experimental results demonstrate different improvements in type annotation coverage, depending on the machine learning model used for type inference. Type4Py showed an improvement of 62.45% and TypeT5 of 79.93%. The precision of type annotations from these models are 0.36 and 0.51, respectively. Performance-wise, PyHintSearch can efficiently explore the exponential search space, annotating 16 diverse projects, ranging from small to large, in approximately 13.75 hours when using the Type4Py model. Regarding practical usability, the impact of type annotations on downstream program analysis is examined through the generation of call graphs. The additional information that type annotations provide can be used to refine the call graph by eliminating irrelevant calls to make it more precise.

Files

Mark_Bekooy_-_MSc_Thesis_Final... (pdf)

(pdf | 1.01 Mb)

License info not available