Combining Type4Py’s Deep Similarity Learning-based Type Inference with Static Type Inference for Python

More Info
expand_more

Abstract

Dynamic programming languages (DPLs), such as Python and Ruby, are often used for their flexibility and fast development. The absence of static typing can lead to runtime exceptions and reduced program understandability. To overcome these problems, some DPLs have introduced optional static typing. Because of the tedious effort of adding type annotations to existing projects, different approaches have been employed to generate type annotations. Static type inference methods are sound in their suggestions, but the dynamic nature of DPLs, combined with insufficient satisfied static dependencies, can cause imprecision. Other proposed approaches used machine learning (ML)-based type inference to predict type annotations. ML-based methods don’t have the limitations of static type inference, however, their performance depends on the training set’s quality and they cannot guarantee type correctness because of their probabilistic techniques. One of such ML-based inference approaches, is the state-of-the-art Type4Py model. Type4Py suffers from some of the same limitations of other learning-based approaches, e.g. it cannot predict types outside of its pre-defined type clusters. To this end, this paper presents hpredict, a tool that combines type prediction of Type4Py’s pre-trained model with static type inference. hpredict runs Type4Py’s learning-based inference and static type inference on different copies of type slots and combines the predictions from both methods. Experiments on the test set of the ManyTypes4Py dataset show that hpredict outperforms Type4Py significantly by 11% regarding Top-10 prediction. The findings of this research, lend evidence that hpredict can increase Type4Py’s general type prediction performance by employing static type inference as well.