Beyond obfuscation: Signature-based and relocation-resistant vulnerability detection in Uber JARs
D. Plămădeală (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Thomas Durieux – Mentor (TU Delft - Software Engineering)
A. Panichella – Graduation committee member (TU Delft - Software Engineering)
J.E.A.P. Decouchant – Graduation committee member (TU Delft - Data-Intensive Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Software development often relies on dependencies managed by package managers to simplify the integration of external libraries and frameworks, reducing development time. However, developers sometimes choose to bundle dependencies directly within their software packages. Bundling dependencies means including all necessary third-party frameworks directly within the application's distributable archive, such as a JAR file, to ensure all components are present without needing external installations. This practice, resulting in Uber JARs (or fat JARs), presents both challenges and advantages within the Maven ecosystem. This project examines the prevalence, risks, and impact of Uber JARs by analyzing over 9 million POM files and 12 million JAR artifacts from Maven Central, identifying artifacts with previously undetected vulnerabilities. Notably, 10.48% of the analyzed artifacts, amounting to 915,089, fall under the category of Uber JARs, indicating a significant prevalence within the Maven repository. Central to this work, JarSift detects Uber JARs' contents, including the libraries, their versions, and vulnerabilities. JarSift's accuracy is demonstrated with an F1 score ranging from 0.474 to 0.857, depending on the Uber JAR configuration. Analysis reveals about 17.13% Uber JARs in a small-scale dataset contained undisclosed vulnerabilities, and 0.63% of all libraries in our dataset fully completely matched known vulnerable libraries. These findings highlight the need for better detection and mitigation strategies in the Maven ecosystem and inform developers of potential risks, helping them implement more robust security measures.