Modern software depends heavily on third-party open-source libraries, with the vast majority of applications incorporating external components. While this dependency-driven development accelerates innovation, it creates significant security risks through complex, deeply nested d
...
Modern software depends heavily on third-party open-source libraries, with the vast majority of applications incorporating external components. While this dependency-driven development accelerates innovation, it creates significant security risks through complex, deeply nested dependency graphs where vulnerabilities can propagate across thousands of downstream systems.
While Software Composition Analysis (SCA) tools effectively identify known vulnerabilities, they generate overwhelming alert volumes in large organizations. Our analysis shows that over 8% of dependencies have known vulnerabilities, with each vulnerable version appearing multiple times across projects. This results in dozens of alerts per project, making manual triage infeasible.
This thesis presents a data-driven approach to prioritizing dependency risk, addressing the challenge of identifying the most critical security threats within the overwhelming volumes of alerts generated by SCA tools. The methodology integrates multiple risk indicators, including severity scores, exploit prediction metrics, known exploitation evidence, dependency freshness measures, and license compliance risks into a unified feature set. To capture transitive risk propagation while maintaining focus on actionable components, the framework applies a depth-weighted aggregation technique that assigns exponentially decreasing weights to deeper dependencies. Prioritization is performed using an autoencoder-based model, which leverages reconstruction error to rank dependencies by risk.
The framework was evaluated on thousands of real-world dependencies and showed promise in ranking components based on complex, multi-dimensional risk signals. It prioritized not only dependencies with extreme values in individual indicators but also those with unusual combinations across dimensions, including risks buried in transitive relationships. In a preliminary validation study, expert reviewers agreed with the model’s prioritizations in 96.7% of cases, highlighting its practical relevance and alignment with expert opinion.
By integrating diverse risk indicators, modeling transitive influence, and leveraging autoencoders, this work provides a practical framework for identifying high-risk dependencies in complex software ecosystems. It reduces noise in vulnerability alerts, highlights truly critical components, and supports more focused remediation. While not a replacement for expert judgment, the framework complements existing practices, representing a step toward more adaptive and risk-aware approaches within modern software ecosystems.