Assessing Formal Verification in SPARK
A Case-Study Evaluation of Formal Verification Tooling
D. Blanovschi (TU Delft - Electrical Engineering, Mathematics and Computer Science)
K.F. Wullaert – Mentor (TU Delft - Programming Languages)
Benedikt Ahrens – Mentor (TU Delft - Programming Languages)
M. Izadi – Graduation committee member (TU Delft - Software Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Formal verification promises stronger correctness guarantees than conventional testing, yet it is often perceived as too costly or specialised for everyday software development. This thesis investigates whether SPARK — Ada’s provable subset — can deliver industrial- strength verification for representative algorithms and concurrent systems. Two separate questions guided the work: how well does SPARK work with (RQ1) sequential algorithms; and (RQ2) concurrent systems.
We implemented and formally proved insertion sort, quick-sort, and a generic statically-allocated hash-map to SPARK’s highest assurance level, achieving full functional correctness, as well as two concurrent case studies: a Publisher / Subscriber channel and a novel IO multi-reactor pattern runtime task scheduler, where we combined SPARK with TLA+ model checking to capture liveness and safety properties that SPARK cannot express directly. Across all sequential examples, proof overhead averaged ratios of 6–10 lines of specification, while for the concurrent case-studies, the TLA+ models averaged ratios of 1.7 lines, per every line of executable code.
The study shows that (i) SPARK is practical for non-trivial sequential algorithms, unless they make use of the heap; (ii) concurrency still requires external formalisms, but the combination remains tractable; and (iii) careful specification design, not solver performance, is the dominant cost driver. These findings confirm that correctness-by-construction is attainable within undergraduate project scope and provide quantitative benchmarks for future work.