Circular Image

V.V. Kovalenko

info

Please Note

7 records found

Conference paper (2024) - Arkadii Sapozhnikov, Mitchell Olsthoorn, A. Panichella, V.V. Kovalenko, P. Derakhshanfar
Writing software tests is laborious and time-consuming. To address this, prior studies introduced various automated test-generation techniques. A well-explored research direction in this field is unit test generation, wherein artificial intelligence (AI) techniques create tests for a method/class under test. While many of these techniques have primarily found applications in a research context, existing tools (e.g., EvoSuite, Randoop, and AthenaTest) are not user-friendly and are tailored to a single technique. This paper introduces TestSpark, a plugin for IntelliJ IDEA that enables users to generate unit tests with only a few clicks directly within their Integrated Development Environment (IDE). Furthermore, TestSpark also allows users to easily modify and run each generated test and integrate them into the project workflow. TestSpark leverages the advances of search-based test generation tools, and it introduces a technique to generate unit tests using Large Language Models (LLMs) by creating a feedback cycle between the IDE and the LLM. Since TestSpark is an open-source (https://github.com/JetBrains-Research/TestSpark), extendable, and well-documented tool, it is possible to add new test generation methods into the plugin with the minimum effort. This paper also explains our future studies related to TestSpark and our preliminary results. Demo video: https://youtu.be/0F4PrxWfiXo ...
Doctoral thesis (2021) - V.V. Kovalenko, A. van Deursen, A. Bacchelli
Specialized tools, such as IDEs, issue trackers, and code review tools, are an indispensable part of the modern software engineering process. These tools are constantly evolving. Besides enabling tools to support a wider range of technologies and frameworks, we are learning to provide additional features in completely new ways. One prominent stream of innovation in software engineering tools is dedicated to utilizing historical data to enable data-driven features, such as defect prediction engines and recommender systems, which leverage records of prior activity to assist with decision making. Many data-driven features in software engineering tools initially get born out of the context of real-world tools as techniques devised and evaluated in synthetic settings by researchers. While convenient, synthetic evaluation of approaches that are ultimately aimed at bringing improvement to real world problems involves a number of simplifications and assumptions. In this dissertation, we highlight several aspects that, while vital for bringing innovative methods to software engineering tools, are often discarded in existing research. We closely explore several topics specific to artificial evaluation environments, such as simplifications in mining file modification histories, use of synthetic datasets for source code authorship attribution, and a gap between accuracy of reviewer recommendation models and their perception by users. Moreover, we make a case for sharing technical artifacts by converting data mining pipelines into reusable tools, and propose a novel approach to modeling expertise transfer from code modification by capturing individual contribution style of developers. Key contributions of this dissertation include a high-level model of the lifecycle of a data-driven software engineering technique, a discussion of dangerous assumptions and simplifications that are made on every step in this lifecycle, a demonstration of importance of a careful approach to mining software repositories, and a demonstration of serious misalignment between artificial evaluation and realistic environments for the problems of code reviewer recommendation and code authorship attribution. We conclude the dissertation by discussing underlying reasons for misalignment between research environments and real-world tools, and propose potential steps to narrow it down and ultimately accelerate innovation in software engineering tooling. ...
Journal article (2020) - Paul Ralph, Sebastian Baltes, Minghui Zhou, Burak Turhan, Rashina Hoda, Hideaki Hata, Gregorio Robles, Amin Milani Fard, Rana Alkadhi, Gianisa Adisaputri, Richard Torkar, Vladimir Kovalenko, Marcos Kalinowski, Nicole Novielli, Shin Yoo, Xavier Devroey, Xin Tan
Context: As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions.

Objective: This study investigates the effects of the pandemic on developers’ wellbeing and productivity.

Method: A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling.

Results: The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Confirmatory results include: (1) the pandemic has had a negative effect on developers’ wellbeing and productivity; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity. Exploratory analysis suggests that: (1) women, parents and people with disabilities may be disproportionately affected; (2) different people need different kinds of support.

Conclusions: To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees’ home offices. Women, parents and disabled persons may require extra support. ...

A library for mining of path-based representations of code

Conference paper (2019) - Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, Alberto Bacchelli
One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language. Preprint [https://doi.org/10.5281/zenodo.2595271]; released tool [https://doi.org/10.5281/zenodo.2595257]. ...
Journal article (2019) - Vladimir Kovalenko, Nava Tintarev, Evgeny Pasynkov, Christian Bird, Alberto Bacchelli
Selecting reviewers for code changes is a critical step for an efficient code review process. Recent studies propose automated reviewer recommendation algorithms to support developers in this task. However, the evaluation of recommendation algorithms, when done apart from their target systems and users (i.e., code review tools and change authors), leaves out important aspects: perception of recommendations, influence of recommendations on human choices, and their effect on user experience. This study is the first to evaluate a reviewer recommender in vivo. We compare historical reviewers and recommendations for over 21,000 code reviews performed with a deployed recommender in a company environment and set out to measure the influence of recommendations on users' choices, along with other performance metrics. Having found no evidence of influence, we turn to the users of the recommender. Through interviews and a survey we find that, though perceived as relevant, reviewer recommendations rarely provide additional value for the respondents. We confirm this finding with a larger study at another company. The confirmation of this finding brings up a case for more user-centric approaches to designing and evaluating the recommenders. Finally, we investigate information needs of developers during reviewer selection and discuss promising directions for the next generation of reviewer recommendation tools. Preprint: https://doi.org/10.5281/zenodo.1404814. ...
Conference paper (2018) - Vladimir Kovalenko, Alberto Bacchelli
Onboarding is a critical stage in the tenure of software developers with a project, because meaningful contribution requires familiarity with the codebase. Some software teams employ practices, such as mentoring, to help new developers get accustomed faster. Code review, i.e., the manual inspection of code changes, is an opportunity for sharing knowledge and helping with onboarding. In this study, we investigate whether and how contributions from developers with low experience in a project do receive a different treatment during code review. We compare reviewers' experience, metrics of reviewers' attention, and change merge rate between changes from newcomers and from more experienced authors in 60 active open source projects. We find that the only phenomenon that is consistent across the vast majority of projects is a lower merge rate for newcomers' changes. ...

Should we consider branches?

Conference paper (2018) - Vladimir Kovalenko, Fabio Palomba, Alberto Bacchelli
Modern distributed version control systems, such as Git, offer support for branching — the possibility to develop parts of software outside the master trunk. Consideration of the repository structure in Mining Software Repository (MSR) studies requires a thorough approach to mining, but there is no well-documented, widespread methodology regarding the handling of merge commits and branches. Moreover, there is still a lack of knowledge of the extent to which considering branches during MSR studies impacts the results of the studies. In this study, we set out to evaluate the importance of proper handling of branches when calculating file modification histories. We analyze over 1,400 Git repositories of four open source ecosystems and compute modification histories for over two million files, using two different algorithms. One algorithm only follows the first parent of each commit when traversing the repository, the other returns the full modification history of a file across all branches. We show that the two algorithms consistently deliver different results, but the scale of the difference varies across projects and ecosystems. Further, we evaluate the importance of accurate mining of file histories by comparing the performance of common techniques that rely on file modification history — reviewer recommendation, change recommendation, and defect prediction — for two algorithms of file history retrieval. We find that considering full file histories leads to an increase in the techniques’ performance that is rather modest. ...