Property-Based Testing in Practice using Hypothesis
In-depth study on how developers use Property-Based Testing in Python using Hypothesis
D. de Koning (TU Delft - Electrical Engineering, Mathematics and Computer Science)
S. Juhošová – Mentor (TU Delft - Programming Languages)
M.A. Costea – Mentor (TU Delft - Programming Languages)
Marco Zúñiga Zuñiga Zamalloa – Graduation committee member (TU Delft - Networked Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Property-based testing (PBT) allows developers to specify high-level properties that should hold for a range of inputs, which are then automatically generated by the testing framework. Despite its theoretical appeal, PBT is not widely used in the real world. To better understand how PBT is used in practice, we present a qualitative and quantitative study of 87 property-based tests written with the Hypothesis framework in seven widely used Python projects, including cpython, pandas, and jax.
Our analysis reveals that while PBTs are relatively rare in these repositories, they are often simple in structure and highly effective at expressing functional properties. The most common patterns include round-trip checks and comparisons against test oracles, which account for a significant portion of the test suite. We also observed a high rate of custom generator usage (39.1%), but no use of custom shrinking, suggesting strong defaults in Hypothesis. The dataset was variable in stylistic and structural choices, ranging from single-assertion tests to complex hardware-dependent cases.
This study provides new insights into the practical use of PBT in Python, expands on prior quantitative work with qualitative findings, and identifies concrete implications for improving testing frameworks and educational resources. We conclude with recommendations for supporting broader adoption of PBT and outline directions for future research, including cross-language comparison, automated annotation, and temporal analysis of test evolution.