Code smell detection by deep direct-learning and transfer-learning

More Info
expand_more

Abstract

Context: An excessive number of code smells make a software system hard to evolve and maintain. Machine learning methods, in addition to metric-based and heuristic-based methods, have been recently applied to detect code smells; however, current methods are considered far from mature. Objective: First, explore the feasibility of applying deep learning models to detect smells without extensive feature engineering. Second, investigate the possibility of applying transfer-learning in the context of detecting code smells. Methods: We train smell detection models based on Convolution Neural Networks and Recurrent Neural Networks as their principal hidden layers along with autoencoder models. For the first objective, we perform training and evaluation on C# samples, whereas for the second objective, we train the models from C# code and evaluate the models over Java code samples and vice-versa. Results: We find it feasible to detect smells using deep learning methods though the models’ performance is smell-specific. Our experiments show that transfer-learning is definitely feasible for implementation smells with performance comparable to that of direct-learning. This work opens up a new paradigm to detect code smells by transfer-learning especially for the programming languages where the comprehensive code smell detection tools are not available.