Analyzing Linux on a Supercomputer
D. Spinellis (Athens University of Economics and Business, TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The C preprocessor, a key element of the language, has become a liability due to its lack of integration with modern language semantics. This column describes the analysis of the C preprocessor usage in the Linux kernel, comprising 20 million lines of code, using the CScout refactoring browser. Processing limitations led to a solution leveraging a supercomputer’s parallel processing capabilities. The analysis divided the kernel’s source files across 32 supercomputer nodes and implemented a binary tournament database merging strategy. Initial efforts revealed multiple difficulties. Resolving them involved several false starts involving recursive SQL statements, an SQLite extension, and the GraphViz connected components tool. After a number of redesigns guided by stress-testing, the analysis finished in just 32 hours rather than a week, using 374 CPU hours and 640 GiB RAM on the supercomputer’s nodes.
Files
File under embargo until 25-08-2025