Property-Based Testing in Practice using Hypothesis

None, None

Property-Based Testing in Practice using Hypothesis

In-depth study on how developers use Property-Based Testing in Python using Hypothesis

Bachelor Thesis (2025)

Author(s)

D. de Koning (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Juhošová – Mentor (TU Delft - Programming Languages)

M.A. Costea – Mentor (TU Delft - Programming Languages)

Marco Zúñiga Zuñiga Zamalloa – Graduation committee member (TU Delft - Networked Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Python Property-based Testing Software testing Hypothesis

To reference this document use:

https://resolver.tudelft.nl/uuid:aa9cc98d-032f-4544-9447-d6e24bb8ebd2

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

27-06-2025

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Property-based testing (PBT) allows developers to specify high-level properties that should hold for a range of inputs, which are then automatically generated by the testing framework. Despite its theoretical appeal, PBT is not widely used in the real world. To better understand how PBT is used in practice, we present a qualitative and quantitative study of 87 property-based tests written with the Hypothesis framework in seven widely used Python projects, including cpython, pandas, and jax.

Our analysis reveals that while PBTs are relatively rare in these repositories, they are often simple in structure and highly effective at expressing functional properties. The most common patterns include round-trip checks and comparisons against test oracles, which account for a significant portion of the test suite. We also observed a high rate of custom generator usage (39.1%), but no use of custom shrinking, suggesting strong defaults in Hypothesis. The dataset was variable in stylistic and structural choices, ranging from single-assertion tests to complex hardware-dependent cases.

This study provides new insights into the practical use of PBT in Python, expands on prior quantitative work with qualitative findings, and identifies concrete implications for improving testing frameworks and educational resources. We conclude with recommendations for supporting broader adoption of PBT and outline directions for future research, including cross-language comparison, automated annotation, and temporal analysis of test evolution.

Files

Research_Project_Property_base... (pdf)

(pdf | 0.506 Mb)

License info not available