Circular Image

173 records found

Authored

Clonos

Consistent Causal Recovery for Highly-Available Streaming Dataflows

Stream processing lies in the backbone of modern businesses, being employed for mission critical applications such as real-time fraud detection, car-trip fare calculations, traffic management, and stock trading. Large-scale applications are executed by scale-out stream process ...

Well-typed programs can go wrong

A study of typing-related bugs in JVM compilers

Despite the substantial progress in compiler testing, research endeavors have mainly focused on detecting compiler crashes and subtle miscompilations caused by bugs in the implementation of compiler optimizations. Surprisingly, this growing body of work neglects other compiler ...

Unix has evolved for almost five decades, shaping modern operating systems, key software technologies, and development practices. Studying the evolution of this remarkable system from an architectural perspective can provide insights on how to manage the growth of large, compl ...

Software evolution

The lifetime of fine-grained elements

A model regarding the lifetime of individual source code lines or tokens can estimate maintenance effort, guide preventive maintenance, and, more broadly, identify factors that can improve the efficiency of software development. We present methods and tools that allow tracking ...

Software reuse cuts both ways

An empirical analysis of its relationship with security vulnerabilities

Software reuse is a widely adopted practice among both researchers and practitioners. The relation between security and reuse can go both ways: a system can become more secure by relying on mature dependencies, or more insecure by exposing a larger attack surface via exploitab ...

The software heritage graph dataset

Public software development under one roof

Software Heritage is the largest existing public archive of software source code and accompanying development history: it currently spans more than five billion unique source code files and one billion unique commits, coming from more than 80 million software projects. This pa ...

Code review comments

Language matters

Recent research provides evidence that effective communication in collaborative software development has significant impact on the software development lifecycle. Although related qualitative and quantitative studies point out textual characteristics of well-formed messages, the ...

Smelly relations

Measuring and understanding database schema quality

Context: Databases are an integral element of enterprise applications. Similarly to code, database schemas are also prone to smells - best practice violations. Objective: We aim to explore database schema quality, associated characteristics and their relationships with other s ...

VulinOSS

A dataset of security vulnerabilities in open-source systems

Examining the different characteristics of open-source software in relation to security vulnerabilities, can provide the research community with findings that can lead to the development of more secure systems. We present a dataset where the reported vulnerabilities of 8694 open- ...

The Exception Handling Riddle

An Empirical Study on the Android API

We examine the use of the Java exception types in the Android platform’s Application Programming Interface (API) reference documentation and their impact on the stability of Android applications. We develop a method that automatically assesses an API’s quality regarding the excep ...

Echoes from space

Grouping commands with large-scale telemetry data

Background: As evolving desktop applications continuously accrue new features and grow more complex with denser user interfaces and deeply-nested commands, it becomes inefficient to use simple heuristic processes for grouping gui commands in multi-level menus. Existing search-bas ...

Fatal injection

A survey of modern code injection attack countermeasures

With a code injection attack (CIA) an attacker can introduce malicious code into a computer program or system that fails to properly encode data that comes from an untrusted source. A CIA can have different forms depending on the execution context of the application and the lo ...

House of Cards

Code Smells in Open-Source C# Repositories

Background: Code smells are indicators of quality problems that make a software hard to maintain and evolve. Given the importance of smells in the source code's maintainability, many studies have explored the characteristics of smells and analyzed their effects on the software's ...

The long-term growth rate of evolving software

Empirical results and implications

The amount of code in evolving software-intensive systems appears to be growing relentlessly, affecting products and entire businesses. Objective figures quantifying the software code growth rate bounds in systems over a large time scale can be used as a reliable predictive ba ...

How to train your browser

Preventing XSS attacks using contextual script fingerprints

Cross-Site Scripting (XSS) is one of the most common web application vulnerabilities. It is therefore sometimes referred to as the “buffer overflow of the web.” Drawing a parallel from the current state of practice in preventing unauthorized native code execution (the typical goa ...

PiCO QL

A software library for runtime interactive queries on program data

PiCO QL is an open source C/C++ software whose scientific scope is real-time interactive analysis of in-memory data through SQL queries. It exposes a relational view of a system's or application's data structures, which is queryable through SQL. While the application or system is ...

TRACER

A platform for securing legacy code

A security vulnerability is a programming error that introduces a potentially exploitable weakness into a computer system. Such a vulnerability can severely affect an organization's infrastructure and cause significant financial damage to it. Hence, one of the basic pursuits in e ...

Contributed

Towards More Effective Querying of Medical Literature in Alexandria3K

How useful can Alexandria3K be for performing literature reviews

The Alexandria3K library, a versatile Python-based tool, has been expanded to include the integra- tion of the PubMed dataset, enriching its capabil- ities in the analysis of scientific papers. Origi- nally supporting major datasets like Crossref and US patents, and smaller yet s ...

Topic Classification of Publications

Identifying publication topics based on existing journals

Accurate topic classification is crucial in the scientific community when it comes to finding relevant journals. However, the efficiency and accuracy of topic classification of publications do not seem to be at its best performance, especially with the fast-paced rise in the quan ...

Author Name Disambiguation using Large Language Models

Contributions to a system for open reproducible publication research

Author name disambiguation, otherwise described as (publication) record linking, is a problem that has had considerable research dedicated to its solv- ing. Author attributions, calculating research met- rics and conducting literature reviews are amongst processes that experience ...