Who Needs Real Data Anyway? Exploring the Use of Synthetic Data in Economic Evaluations of Health Interventions
N. Van Der Linden (TU Delft - Policy Analysis)
X. G.L.V. Pouwels (University of Twente)
B. Jahn (ONCOTYROL-Center for Personalized Cancer Medicine, UMIT TIROL-University for Health Sciences and Technology)
Uwe Siebert (Harvard Medical School, Harvard T.H. Chan School of Public Health)
H. Koffijberg (University of Twente)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Objectives: Data needed for economic evaluations in healthcare are often subject to privacy regulations and confidentiality, limiting accessibility. This poses challenges for conducting, reviewing, and validating health economic evaluations. The use of “synthetic data” may solve this problem. Methods: An economic evaluation compared “shamectomy” with “usual care” for the prevention of a fictitious disease called shame. A data set (Dorg) was created, consisting of 1000 patients in the base case. Next, synthetic data (Dsyn) were created from Dorg. Dorg and Dsyn were used, separately, to inform a model-based economic evaluation, and the similarity of the results was assessed for various scenarios: different sizes of Dorg, order of synthetization, method of synthetization, number of synthesized data sets, and missing data. Results: With standard settings, incremental cost-effectiveness ratio (ICER)-results for shamectomy were €25 848/quality-adjusted life-year in Dorg and on average €25 857 in 500 Dsyns, 95% CI (€16 776; €60 021). In the base case, 15% of the generated Dsyns resulted in an ICER leading to a positive reimbursement decision, as opposed to a negative decision when using Dorg. With smaller Dorg data sets (n = 50 and n = 500), ICER ranges increased to 95% CI (negative; €151 542) and 95% CI (negative; €669 717), respectively. Conclusions: Outcomes and conclusions of economic analyses based on synthetic data may deviate from those obtained by using the original data. For data sets < 1000 patients, which are common, deviations may be substantial and lead to suboptimal policy decisions. Based on our results, we propose a stepwise approach to using synthetic data for model-based health economic evaluations, using a large number of synthetic data sets (ie, >100) with the same size as the original data.