Circular Image

G. Gousios

24 records found

Late deliveries have been a common problem in the software industry for decades. They often result from deficiencies in effort estimation and project planning. These deficiencies arise due to the complexity of software development, where various social and technical factors affec ...
The increasing number of malicious packages being deployed in open source package repositories like PyPI or npm prompted numerous works aiming to secure open source ecosys- tems. The increased availability and deployment of safeguards raises the question whether and how attackers ...
Type annotations in Python are an integral part of static analysis. They can be used for code documentation, error detection and the development of cleaner architectures. By enhancing code quality, they contribute to the robustness, maintainability and comprehensibility of codeba ...

Topic Classification of Publications

Identifying publication topics based on existing journals

Accurate topic classification is crucial in the scientific community when it comes to finding relevant journals. However, the efficiency and accuracy of topic classification of publications do not seem to be at its best performance, especially with the fast-paced rise in the quan ...
The growth of academic publications, heterogeneity of datasets and the absence of a globally accepted organization identifier introduce the challenge of affiliation disambiguation in bibliographic databases. In this paper, we create a baseline using the currently implemented algo ...

Towards More Effective Querying of Medical Literature in Alexandria3K

How useful can Alexandria3K be for performing literature reviews

The Alexandria3K library, a versatile Python-based tool, has been expanded to include the integra- tion of the PubMed dataset, enriching its capabil- ities in the analysis of scientific papers. Origi- nally supporting major datasets like Crossref and US patents, and smaller yet s ...

Author Name Disambiguation using Large Language Models

Contributions to a system for open reproducible publication research

Author name disambiguation, otherwise described as (publication) record linking, is a problem that has had considerable research dedicated to its solv- ing. Author attributions, calculating research met- rics and conducting literature reviews are amongst processes that experience ...
Type inference plays a pivotal role in modern software development as it aids in understanding code, detecting errors, and facilitating code completion. Two main approaches, static analysis, and machine learning, contribute to this process. Each approach has its own benefits and ...
The adoption of the serverless architecture and the Function-as-a-Service model has significantly increased in recent years, with more enterprises migrating their software and hardware to the cloud. However, most applications require state management, leading to the use of extern ...
The use of open-source packages is a common practice among developers. It decreases the development time and improves maintainability. But adding a dependency to a project comes with inherit risks such as introducing vulnerabilities. A few solutions that help visualize all of the ...
In (open-source) development, developers routinely rely on other libraries to improve their coding efficiency by reusing code. This reliance on other packages could cause issues when critical dependencies have suddenly have a vulnerability introduced to them. This work analyzes t ...
Using open-source packages when developing software applications is the general practice among a vast amount of software developers. However, importing open-source code which may depend on other existing technologies may lead to the appearance of a transitive dependency chain. As ...
The main principle of Open Source development is that developers can reuse different libraries over and over again to make their lives easier. That is why this practice has gained a lot of popularity. However, libraries usually depend on other already existing pieces of code. Thi ...
Developers rely on different software to improve their efficiency as to reuse parts of code and be able to maintain it with ease, which is why open source software libraries have gained much pop- ularity over the past years. This paper analyzes what are the most used packages fro ...

Releasing Fast and Slow

Characterizing Rapid Releases in a Large Software-Driven Organization

The appeal of delivering new features faster has led many software projects to change their development processes towards rapid release models. Even though rapid releases are increasingly being adopted in open-source and commercial software, it is not well understood what the eff ...

Data Driven Decisions

Validating and Supporting a Continuous Experimentation Development Environment

The number of conducted A/B tests is growing throughout companies in software development. Many of these companies develop their own in-house Experimentation Platform to support these experiments. In this thesis we identify factors that influence the trustworthiness and soundness ...
Training machine learning (ML) models for natural language processing usually requires lots of data that is often acquired through crowdsourcing. In crowdsourcing, crowd workers annotate data samples according to one or more properties, such as the sentiment of a sentence, the vi ...
With the increase of online education, a good description of learning resources has become vital for educational resource sharing and reuse. Resource description has been under the spotlight in recent years. Educational platforms can benefit from good resource organisation and de ...
In this project we aimed to create a post-trading day safeguard system that allows for the identification of bugs in the primary and secondary risk control systems at Optiver. These systems are needed to prevent undesirable exposure to the market from happening, and to ensure tha ...
Reactive Programming is a way of programming designed to provide developers with the right abstractions for creating systems that use streams of data. Traditional debug tools lack support for the abstractions provided, causing developers to fallback to the most rudimentary debug ...