JB

J. Barros Cândido

info

Please Note

3 records found

Conference paper (2021) - Jeanderson Cândido, Jan Haesen, Maurício Aniche, Arie van Deursen
Logging is a development practice that plays an important role in the operations and monitoring of complex systems. Developers place log statements in the source code and use log data to understand how the system behaves in production. Unfortunately, anticipating where to log during development is challenging. Previous studies show the feasibility of leveraging machine learning to recommend log placement despite the data imbalance since logging is a fraction of the overall code base. However, it remains unknown how those techniques apply to an industry setting, and little is known about the effect of imbalanced data and sampling techniques. In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company. We analyze 34,526 Java files and 309,527 methods that sum up +2M SLOC. We systematically measure the effectiveness of five models based on code metrics, explore the effect of sampling techniques, understand which features models consider to be relevant for the prediction, and evaluate whether we can exploit 388,086 methods from 29 Apache projects to learn where to log in an industry setting. Our best performing model achieves 79% of balanced accuracy, 81% of precision, 60% of recall. While sampling techniques improve recall, they penalize precision at a prohibitive cost. Experiments with open-source data yield under-performing models over Adyen's test set; nevertheless, they are useful due to their low rate of false positives. Our supporting scripts and tools are available to the community. ...

A systematic mapping study

Modern software development and operations rely on monitoring to understand how systems behave in production. The data provided by application logs and runtime environment are essential to detect and diagnose undesired behavior and improve system reliability. However, despite the rich ecosystem around industry- ready log solutions, monitoring complex systems and getting insights from log data remains a challenge. Researchers and practitioners have been actively working to address several challenges related to logs, e.g., how to effectively provide better tooling support for logging decisions to developers, how to effectively process and store log data, and how to extract insights from log data. A holistic view of the research effort on logging practices and automated log analysis is key to provide directions and disseminate the state-of-the-art for technology transfer. In this paper, we study 108 papers (72 research track papers, 24 journals, and 12 industry track papers) from different communities (e.g., machine learning, software engineering, and systems) and structure the research field in light of the life-cycle of log data. Our analysis shows that (1) logging is challenging not only in open-source projects but also in industry, (2) machine learning is a promising approach to enable a contextual analysis of source code for log recommendation but further investigation is required to assess the usability of those tools in practice, (3) few studies approached efficient persistence of log data, and (4) there are open opportunities to analyze application logs and to evaluate state-of-the-art log analysis techniques in a DevOps context. ...
Journal article (2020) - Igor Lima, Jeanderson Cândido, Marcelo d'Amorim
Context: Content Management Systems (CMS), such as WordPress, are a very popular category of software for creating web sites and blogs. These systems typically build on top of plugin architectures. Unfortunately, it is not uncommon that the combined activation of multiple plugins in a CMS web site will produce unexpected behavior. Conflict-detection techniques exist but they do not scale. Objective: This paper proposes PENA, a technique to detect conflicts in large sets of plugins as those present in plugin market places. Method: PENA takes on input a configuration, consisting of a potentially large set of plugins, and reports on output the offending plugin combinations. PENA uses an iterative divide-and-conquer search to explore the large space of plugin combinations and a staged filtering process to eliminate false alarms. Results: We evaluated PENA with plugins selected from the WordPress official repository and compared its efficiency and accuracy against the technique that checks conflicts in all pairs of plugins. Results show that PENA is 12.4x to 19.6x more efficient than the comparison baseline and can find as many conflicts as it. ...