S. Proksch
Please Note
9 records found
1
MaRCo
Compatible Version Ranges in Maven
Managing dependencies in Java projects is challenging: undeclared, implicit dependencies and conflicting version declarations can lead to breaking changes and unpredictable resolution. We present MARCO, a tool to improve resolution reliability in Maven. It injects missing direct dependencies and replaces pinned versions with client-agnostic compatible version ranges, which can be safely reused across clients. The ranges are obtained by combining bytecode differencing and cross-version testing to detect API and behaviorally compatible dependency versions. We demonstrate how MARCO can be used to retrieve compatible versions for specific dependencies, replace pinned versions using compatibility mappings, and execute the full pipeline to enable compatibility-aware resolution. Our preliminary evaluation shows MARCo recovers all missing dependencies for 91% of affected projects, and replaces pinned versions with stable, compatible version ranges for 13 % of dependencies on average across 78 % of projects. MARCO demonstrates the feasibility of scalable, compatibility-driven dependency management. The demo is available at https://youtu.be/2faDG8Cmmh0.
Frankenstein
Fast and lightweight call graph generation for software builds
Call Graphs are a rich data source and form the foundation for advanced static analyses that can, for example, detect security vulnerabilities or dead code. This information is invaluable when it is immediately available, such as in the output of a build system. Call Graph generation is a whole-program analysis: not just the application, but also all its dependencies are processed together. Recent work has shown that even advanced static analyses can use summarization techniques to substantially improve runtime; however, existing analyses focus on soundness, and as such remain very expensive. When executed in the build system, which typically has limited resources, even powerful servers suffer from slow build times, rendering these analyses impractical in today’s fast-paced development. In this paper, we aim to strike a balance between improving static analyses while remaining practical for use cases that require quick results in low-resource environments. We propose a summarization-based implementation of a Class-Hierarchy Analysis algorithm for call graph generation of Java programs. Our approach leverages the fact that dependency sets often do not change between builds: we can generate call graphs for these dependencies, cache their generation for subsequent builds, and using a novel stitching algorithm, Frankenstein, merge all partial results into a complete call graph for the whole program. Our evaluation results show that this lightweight approach can substantially outperform existing frameworks. In terms of speed improvements, Frankenstein surpasses the baselines by up to 38%, requiring an average of just 388 Megabytes of memory. This makes the proposed approach practical for build systems with limited memory resources. Despite these optimizations, our generated call graphs maintain a near-identical set of edges when compared to the baselines, achieving an F 1 score of up to 0.98. This summarization-based approach for call graph generation paves the way for using extended static analyses in build processes.
“App stores” are online software stores where end users may browse, purchase, download, and install software applications. By far, the best known app stores are associated with mobile platforms, such as Google Play for Android and Apple’s App Store for iOS. The ubiquity of smartphones has led to mobile app stores becoming a touchstone experience of modern living. App stores have been the subject of many empirical studies. However, most of this research has concentrated on properties of the apps rather than the stores themselves. Today, there is a rich diversity of app stores and these stores have largely been overlooked by researchers: app stores exist on many distinctive platforms, are aimed at different classes of users, and have different end-goals beyond simply selling a standalone app to a smartphone user. The goal of this paper is to survey and characterize the broader dimensionality of app stores, and to explore how and why they influence software development practices, such as system design and release management. We begin by collecting a set of app store examples from web search queries. By analyzing and curating the results, we derive a set of features common to app stores. We then build a dimensional model of app stores based on these features, and we fit each app store from our web search result set into this model. Next, we performed unsupervised clustering to the app stores to find their natural groupings. Our results suggest that app stores have become an essential stakeholder in modern software development. They control the distribution channel to end users and ensure that the applications are of suitable quality; in turn, this leads to developers adhering to various store guidelines when creating their applications. However, we found the app stores operational model could vary widely between stores, and this variability could in turn affect the generalizability of existing understanding of app stores.
Software reuse is a common practice in modern software engineering to save time and energy while accelerating software delivery. Dependency managers like MAVEN offer a large ecosystem of reusable libraries that build the backbone of software reuse. Breaking changes, i.e., when an update to a library introduces incompatible changes that break existing client programs, are troublesome barriers to this library reuse. Semantic Versioning has been proposed as a practice to make it easier for the users to find safe updates by encoding the change impact in the version number. While this practice is widely studied from the framework perspective, no detailed insights exist yet into the ecosystem perspective. In this work, we study violations of semantic versioning in the MAVEN ecosystem for 13,876 versions of 384 artifacts to better understand the impact these violations have on the 7,190 dependent versioned packages. We found that 67% of the artifacts introduce at least one type of semantic versioning violation, either a breaking change or an illegal API extension in their history. An impact analysis on breaking methods that (direct or transitive) dependents reference, revealed strong centralization: 87% of publicly accessible methods are never used by dependents and among methods with at least one usage, half of the unique calls from dependents concentrate on only 35% of the defined methods. We also studied method popularity and could not find an indication that popularity affects stability: even popular methods break frequently. Overall, we confirm the previous result that Semantic Versioning is violated repeatedly in practice. Our results suggest that the frequency of breaking changes might be a sign of insufficient change-impact awareness on the ecosystem and we believe that developers require more adequate information, like method popularity, to improve their update strategies.
Source code comments are a cornerstone of software documentation facilitating feature development and maintenance. Well-defined documentation formats, like Javadoc, make it easy to include structural metadata used to, for example, generate documentation manuals. However, the actual usage of structural elements in source code comments has not been studied yet. We investigate to which extent these structural elements are used in practice and whether the added information can be leveraged to improve tools assisting developers when writing comments. Existing research on comment generation traditionally focuses on automatic generation of summaries. However, recent works have shown promising results when supporting comment authoring through a next-word prediction. In this paper, we present an in-depth analysis of commenting practice in more than 18K open-source projects written in Python and Java showing that many structural elements, particularly parameter and return value descriptions are indeed widely used. We discover that while a majority are rather short at about 6 to 9 words, many are several hundred words in length. We further find that Python comments tend to be significantly longer than Java comments, possibly due to the weakly-typed nature of the former. Following the empirical analysis, we extend an existing language model with support for structural information, substantially improving the Top-1 accuracy of predicted words (Python 9.6%, Java 7.8%).
Type4Py
Practical Deep Similarity Learning-Based Type Inference for Python
Dynamic languages, such as Python and Javascript, trade static typing for developer flexibility and productivity. Lack of static typing can cause run-time exceptions and is a major factor for weak IDE support. To alleviate these issues, PEP 484 introduced optional type annotations for Python. As retrofitting types to existing code-bases is error-prone and laborious, machine learning (ML)-based approaches have been proposed to enable automatic type infer-ence based on existing, partially annotated codebases. However, previous ML-based approaches are trained and evaluated on human-provided type annotations, which might not always be sound, and hence this may limit the practicality for real-world usage. In this paper, we present TYPE4Py, a deep similarity learning-based hier-archical neural network model. It learns to discriminate between similar and dissimilar types in a high-dimensional space, which results in clusters of types. Likely types for arguments, variables, and return values can then be inferred through the nearest neigh-bor search. Unlike previous work, we trained and evaluated our model on a type-checked dataset and used mean reciprocal rank (MRR) to reflect the performance perceived by users. The obtained results show that TYPE4Py achieves an MRR of 77.1 %, which is a substantial improvement of 8.1% and 16.7% over the state-of-the-art approaches Typilus and Typewriter, respectively. Finally, to aid developers with retrofitting types, we released a Visual Stu-dio Code extension, which uses TYPE4Py to provide ML-based type auto-completion for Python.
Configuration smells in continuous delivery pipelines
A linter and a six-month study on GitLab
An effective and efficient application of Continuous Integration (CI) and Delivery (CD) requires software projects to follow certain principles and good practices. Configuring such a CI/CD pipeline is challenging and error-prone. Therefore, automated linters have been proposed to detect errors in the pipeline. While existing linters identify syntactic errors, detect security vulnerabilities or misuse of the features provided by build servers, they do not support developers that want to prevent common misconfigurations of a CD pipeline that potentially violate CD principles ("CD smells"). To this end, we propose CD-Linter, a semantic linter that can automatically identify four different smells in pipeline configuration files. We have evaluated our approach through a large-scale and long-term study that consists of (i) monitoring 145 issues (opened in as many open-source projects) over a period of 6 months, (ii) manually validating the detection precision and recall on a representative sample of issues, and (iii) assessing the magnitude of the observed smells on 5,312 open-source projects on GitLab. Our results show that CD smells are accepted and fixed by most of the developers and our linter achieves a precision of 87% and a recall of 94%. Those smells can be frequently observed in the wild, as 31% of projects with long configurations are affected by at least one smell.