Combining Type4Py’s Deep Similarity Learning-based Type Inference with Static Type Inference for Python

Bachelor thesis (2022)

Authors

A. Al Haydar Electrical Engineering, Mathematics and Computer Science

Contributors

S. Proksch Software Engineering - (supervisor 1)

S.A.M. Mir Software Engineering - (supervisor 1)

J.A. Pouwelse Data-Intensive Systems - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:d3a0a22b-1694-4cb8-abbc-d6c534a3768c

Published Date

24-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility and fast development. The absence of static typing can lead to runtime exceptions and reduced program understandability. To overcome these problems, some DPLs have introduced optional static typing. Because of the tedious effort of adding type annotations to existing projects, different approaches have been employed to generate type annotations. Static type inference methods are sound in their suggestions, but the dynamic nature of DPLs, combined with insufficient satisfied static dependencies, can cause imprecision. Other proposed approaches used machine learning (ML)-based type inference to predict type annotations. ML-based methods don’t have the limitations of static type inference, however, their performance depends on the training set’s quality and they cannot guarantee type correctness because of their probabilistic techniques. One of such ML-based inference approaches, is the state-of-the-art Type4Py model. Type4Py suffers from some of the same limitations of other learning-based approaches, e.g. it cannot predict types outside of its pre-defined type clusters. To this end, this paper presents hpredict, a tool that combines type prediction of Type4Py’s pre-trained model with static type inference. hpredict runs Type4Py’s learning-based inference and static type inference on different copies of type slots and combines the predictions from both methods. Experiments on the test set of the ManyTypes4Py dataset show that hpredict outperforms Type4Py significantly by 11% regarding Top-10 prediction. The findings of this research, lend evidence that hpredict can increase Type4Py’s general type prediction performance by employing static type inference as well.

Files

Research_paper_with_title_page... (.pdf)

(.pdf | 0.475 Mb)