S. Huang
Please Note
16 records found
1
Using a dataset of 565 GitHub repositories, we analyze both short-term trends and long-term evolution of development strategies. We find that while feature branching is strongly associated with higher delivery frequency and lower defect counts, trunk-based workflows (though rare) sometimes outperform in lead time and recovery. Similarly, frequent merges correlate with faster delivery and shorter lead times, regardless of size. A longitudinal subset reveals that projects shift toward feature-based development over time, but do not consistently adopt smaller or more frequent merges.
We also highlight methodological limitations in mining GitHub. Future research should incorporate longitudinal repository tracking and developer surveys to capture workflows that are invisible to snapshot-based analysis. This study contributes to a nuanced understanding of how code management practices shape CI outcomes in collaborative OSS projects. ...
Using a dataset of 565 GitHub repositories, we analyze both short-term trends and long-term evolution of development strategies. We find that while feature branching is strongly associated with higher delivery frequency and lower defect counts, trunk-based workflows (though rare) sometimes outperform in lead time and recovery. Similarly, frequent merges correlate with faster delivery and shorter lead times, regardless of size. A longitudinal subset reveals that projects shift toward feature-based development over time, but do not consistently adopt smaller or more frequent merges.
We also highlight methodological limitations in mining GitHub. Future research should incorporate longitudinal repository tracking and developer surveys to capture workflows that are invisible to snapshot-based analysis. This study contributes to a nuanced understanding of how code management practices shape CI outcomes in collaborative OSS projects.
Navigating Repositories
Assessing the Impact of External Repositories on Packages in Maven Central
external repository are caused by one of their external repositories. Moreover, we found that 69.58% of the repository urls were unreachable. 19.31% of the unique ids have two or more different repository urls associated with them. Based on our findings, developers are urged to thoroughly evaluate their usage of external repositories and to consider checking their settings.xml and POM.xml files to
ensure no url or id collisions are prevent or causing unintended behaviour. ...
external repository are caused by one of their external repositories. Moreover, we found that 69.58% of the repository urls were unreachable. 19.31% of the unique ids have two or more different repository urls associated with them. Based on our findings, developers are urged to thoroughly evaluate their usage of external repositories and to consider checking their settings.xml and POM.xml files to
ensure no url or id collisions are prevent or causing unintended behaviour.
Discovering Digital Siblings
Quantifying Inter-Repository Similarity Through GitHub Dependency Structures
Finding your digital sibling
Grouping GitHub projects that share certain attributes based on interactions and activities
Finding your digital sibling: which other GitHub projects are similar to yours?
Finding similar repositories based on the available documentation
Contribution of source code identifiers to GitHub project similarity
Which other GitHub projects are similar to yours?
Our research and analysis seek to find the contribution of source code identifiers to overall project similarity. We define project similarity and define each type of identifier we evaluate. After these steps, we extract the defined types of identifiers from a list of projects. From this list of projects, we use twenty projects as queries for our analysis. We then analyze all identifiers using techniques such as TF-IDF and LSA. Our findings are that combining all types of identifiers results in the highest chance of having the same topic when looking at the most similar project. We also find that splitting each identifier on its casing and combining all split identifiers results in the highest chance that the most similar project found is similar. We therefore see that source code identifiers are reasonably contributing to overall project similarities. ...
Our research and analysis seek to find the contribution of source code identifiers to overall project similarity. We define project similarity and define each type of identifier we evaluate. After these steps, we extract the defined types of identifiers from a list of projects. From this list of projects, we use twenty projects as queries for our analysis. We then analyze all identifiers using techniques such as TF-IDF and LSA. Our findings are that combining all types of identifiers results in the highest chance of having the same topic when looking at the most similar project. We also find that splitting each identifier on its casing and combining all split identifiers results in the highest chance that the most similar project found is similar. We therefore see that source code identifiers are reasonably contributing to overall project similarities.
Exploring Descriptive Metrics of Build Performance
A Study of GitHub Actions in Continuous Integration Projects
We conduct a small case study on repositories utilizing GitHub Actions, a CI tool that is relatively unexplored. Within this context, we classify projects using two performance indicators: build breakages and build durations. We examine two distinct sets of metrics in our analysis. The first set being build level metrics, which are closely linked to the build stage. The second set including project level metrics.
Our findings suggest that patterns traditionally associated with low breakages and durations are applicable to repositories employing GitHub Actions. However, understanding the relationship between project level metrics demands a more comprehensive approach, necessitating a thorough analysis of the project context for a holistic understanding of build performance. ...
We conduct a small case study on repositories utilizing GitHub Actions, a CI tool that is relatively unexplored. Within this context, we classify projects using two performance indicators: build breakages and build durations. We examine two distinct sets of metrics in our analysis. The first set being build level metrics, which are closely linked to the build stage. The second set including project level metrics.
Our findings suggest that patterns traditionally associated with low breakages and durations are applicable to repositories employing GitHub Actions. However, understanding the relationship between project level metrics demands a more comprehensive approach, necessitating a thorough analysis of the project context for a holistic understanding of build performance.
Github Mining
Discover the Descriptive Metrics of the Context in Continuous Integration (CI) Project
GitHub Mining
The Implementation of Continuous Integration Pipelines
Discovering the metrics for assessing a project’s maturity
An analysis of key indicators of maturity
Our findings indicate that project maturity cannot be captured by a single metric, but rather a combination of metrics reflecting different aspects throughout the project's lifecycle. Activity levels, including commits and pull requests, popularity indicators like stargazers, forks, and contributors, as well as repository size and age, emerge as primary indicators of maturity. By combining these metrics, a unified framework for categorizing mature projects can be established and further developed. ...
Our findings indicate that project maturity cannot be captured by a single metric, but rather a combination of metrics reflecting different aspects throughout the project's lifecycle. Activity levels, including commits and pull requests, popularity indicators like stargazers, forks, and contributors, as well as repository size and age, emerge as primary indicators of maturity. By combining these metrics, a unified framework for categorizing mature projects can be established and further developed.