Daniel M. Germán | TU Delft Repository

What is an app store? The software engineering perspective

Journal article (2024) - Wenhan Zhu, Sebastian Proksch, Daniel M. German, Michael W. Godfrey, Li Li, Shane McIntosh

“App stores” are online software stores where end users may browse, purchase, download, and install software applications. By far, the best known app stores are associated with mobile platforms, such as Google Play for Android and Apple’s App Store for iOS. The ubiquity of smartphones has led to mobile app stores becoming a touchstone experience of modern living. App stores have been the subject of many empirical studies. However, most of this research has concentrated on properties of the apps rather than the stores themselves. Today, there is a rich diversity of app stores and these stores have largely been overlooked by researchers: app stores exist on many distinctive platforms, are aimed at different classes of users, and have different end-goals beyond simply selling a standalone app to a smartphone user. The goal of this paper is to survey and characterize the broader dimensionality of app stores, and to explore how and why they influence software development practices, such as system design and release management. We begin by collecting a set of app store examples from web search queries. By analyzing and curating the results, we derive a set of features common to app stores. We then build a dimensional model of app stores based on these features, and we fit each app store from our web search result set into this model. Next, we performed unsupervised clustering to the app stores to find their natural groupings. Our results suggest that app stores have become an essential stakeholder in modern software development. They control the distribution channel to end users and ensure that the applications are of suitable quality; in turn, this leads to developers adhering to various store guidelines when creating their applications. However, we found the app stores operational model could vary widely between stores, and this variability could in turn affect the generalizability of existing understanding of app stores. ...

“App stores” are online software stores where end users may browse, purchase, download, and install software applications. By far, the best known app stores are associated with mobile platforms, such as Google Play for Android and Apple’s App Store for iOS. The ubiquity of smartphones has led to mobile app stores becoming a touchstone experience of modern living. App stores have been the subject of many empirical studies. However, most of this research has concentrated on properties of the apps rather than the stores themselves. Today, there is a rich diversity of app stores and these stores have largely been overlooked by researchers: app stores exist on many distinctive platforms, are aimed at different classes of users, and have different end-goals beyond simply selling a standalone app to a smartphone user. The goal of this paper is to survey and characterize the broader dimensionality of app stores, and to explore how and why they influence software development practices, such as system design and release management. We begin by collecting a set of app store examples from web search queries. By analyzing and curating the results, we derive a set of features common to app stores. We then build a dimensional model of app stores based on these features, and we fit each app store from our web search result set into this model. Next, we performed unsupervised clustering to the app stores to find their natural groupings. Our results suggest that app stores have become an essential stakeholder in modern software development. They control the distribution channel to end users and ensure that the applications are of suitable quality; in turn, this leads to developers adhering to various store guidelines when creating their applications. However, we found the app stores operational model could vary widely between stores, and this variability could in turn affect the generalizability of existing understanding of app stores.

How bugs are born

A model to identify how bugs are introduced in software components

Journal article (2020) - Gema Rodríguez-Pérez, Gregorio Robles, Alexander Serebrenik, Andy Zaidman, Daniel M. Germán, Jesus M. Gonzalez-Barahona

When identifying the origin of software bugs, many studies assume that “a bug was introduced by the lines of code that were modified to fix it”. However, this assumption does not always hold and at least in some cases, these modified lines are not responsible for introducing the bug. For example, when the bug was caused by a change in an external API. The lack of empirical evidence makes it impossible to assess how important these cases are and therefore, to which extent the assumption is valid. To advance in this direction, and better understand how bugs “are born”, we propose a model for defining criteria to identify the first snapshot of an evolving software system that exhibits a bug. This model, based on the perfect test idea, decides whether a bug is observed after a change to the software. Furthermore, we studied the model’s criteria by carefully analyzing how 116 bugs were introduced in two different open source software projects. The manual analysis helped classify the root cause of those bugs and created manually curated datasets with bug-introducing changes and with bugs that were not introduced by any change in the source code. Finally, we used these datasets to evaluate the performance of four existing SZZ-based algorithms for detecting bug-introducing changes. We found that SZZ-based algorithms are not very accurate, especially when multiple commits are found; the F-Score varies from 0.44 to 0.77, while the percentage of true positives does not exceed 63%. Our results show empirical evidence that the prevalent assumption, “a bug was introduced by the lines of code that were modified to fix it”, is just one case of how bugs are introduced in a software system. Finding what introduced a bug is not trivial: bugs can be introduced by the developers and be in the code, or be created irrespective of the code. Thus, further research towards a better understanding of the origin of bugs in software projects could help to improve design integration tests and to design other procedures to make software development more robust. ...

When identifying the origin of software bugs, many studies assume that “a bug was introduced by the lines of code that were modified to fix it”. However, this assumption does not always hold and at least in some cases, these modified lines are not responsible for introducing the bug. For example, when the bug was caused by a change in an external API. The lack of empirical evidence makes it impossible to assess how important these cases are and therefore, to which extent the assumption is valid. To advance in this direction, and better understand how bugs “are born”, we propose a model for defining criteria to identify the first snapshot of an evolving software system that exhibits a bug. This model, based on the perfect test idea, decides whether a bug is observed after a change to the software. Furthermore, we studied the model’s criteria by carefully analyzing how 116 bugs were introduced in two different open source software projects. The manual analysis helped classify the root cause of those bugs and created manually curated datasets with bug-introducing changes and with bugs that were not introduced by any change in the source code. Finally, we used these datasets to evaluate the performance of four existing SZZ-based algorithms for detecting bug-introducing changes. We found that SZZ-based algorithms are not very accurate, especially when multiple commits are found; the F-Score varies from 0.44 to 0.77, while the percentage of true positives does not exceed 63%. Our results show empirical evidence that the prevalent assumption, “a bug was introduced by the lines of code that were modified to fix it”, is just one case of how bugs are introduced in a software system. Finding what introduced a bug is not trivial: bugs can be introduced by the developers and be in the code, or be created irrespective of the code. Thus, further research towards a better understanding of the origin of bugs in software projects could help to improve design integration tests and to design other procedures to make software development more robust.

The promises and perils of mining GitHub

Conference paper (2014) - Eirini Kalliamvakou, Leif Singer, Georgios Gousios, Daniel M. German, Kelly Blincoe, Daniela Damian

With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHub's event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at understanding the characteristics of the repositories in GitHub and how users take advantage of GitHub's main features-namely commits, pull requests, and issues. Our results indicate that, while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. We show, for example, that the majority of the projects are personal and inactive; that GitHub is also being used for free storage and as a Web hosting service; and that almost 40% of all pull requests do not appear as merged, even though they were. We provide a set of recommendations for software engineering researchers on how to approach the data in GitHub. ...