Assessing Formal Verification in SPARK

A Case-Study Evaluation of Formal Verification Tooling

Bachelor Thesis (2025)
Author(s)

D. Blanovschi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

K.F. Wullaert – Mentor (TU Delft - Programming Languages)

Benedikt Ahrens – Mentor (TU Delft - Programming Languages)

M. Izadi – Graduation committee member (TU Delft - Software Engineering)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
27-06-2025
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Formal verification promises stronger correctness guarantees than conventional testing, yet it is often perceived as too costly or specialised for everyday software development. This thesis investigates whether SPARK — Ada’s provable subset — can deliver industrial- strength verification for representative algorithms and concurrent systems. Two separate questions guided the work: how well does SPARK work with (RQ1) sequential algorithms; and (RQ2) concurrent systems.
We implemented and formally proved insertion sort, quick-sort, and a generic statically-allocated hash-map to SPARK’s highest assurance level, achieving full functional correctness, as well as two concurrent case studies: a Publisher / Subscriber channel and a novel IO multi-reactor pattern runtime task scheduler, where we combined SPARK with TLA+ model checking to capture liveness and safety properties that SPARK cannot express directly. Across all sequential examples, proof overhead averaged ratios of 6–10 lines of specification, while for the concurrent case-studies, the TLA+ models averaged ratios of 1.7 lines, per every line of executable code.
The study shows that (i) SPARK is practical for non-trivial sequential algorithms, unless they make use of the heap; (ii) concurrency still requires external formalisms, but the combination remains tractable; and (iii) careful specification design, not solver performance, is the dominant cost driver. These findings confirm that correctness-by-construction is attainable within undergraduate project scope and provide quantitative benchmarks for future work.

Files

Research_Paper_38_.pdf
(pdf | 0.617 Mb)
License info not available