NLICE: Synthetic Medical Record Generation for Effective Primary Healthcare Differential Diagnosis

None, None; None, None; None, None; None, None; None, None; None, None

NLICE: Synthetic Medical Record Generation for Effective Primary Healthcare Differential Diagnosis

Conference Paper (2023)

Author(s)

Zaid Al-Ars (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Obinna Agba (Student TU Delft)

Zhuoran Guo (Student TU Delft)

Christiaan Boerkamp (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Ziyaad Jaber (Medvice Digital Health)

Tareq Jaber (Medvice Digital Health)

Research Group

Computer Engineering

Machine learning Synthetic data Medical records Differential diagnosis

DOI related publication

https://doi.org/10.1109/BIBE60311.2023.00071 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:eade6327-fdca-45cd-9abf-f81a111e9bb1

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Computer Engineering

Bibliographical Note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Pages (from-to)

397-402

ISBN (print)

979-8-3503-9312-5

ISBN (electronic)

979-8-3503-9311-8

Event

2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE) (2023-12-04 - 2023-12-06), Dayton, United States

Downloads counter

460

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper offers a systematic method for creating medical knowledge-grounded patient records for use in activities involving differential diagnosis. Additionally, an assessment of machine learning models that can differentiate between various conditions based on given symptoms is also provided. We use a public disease-symptom data source called SymCat in combination with Synthea to construct the patients records. In order to increase the expressive nature of the synthetic data, we use a medically-standardized symptom modeling method called NLICE to augment the synthetic data with additional contextual information for each condition. In addition, Naive Bayes and Random Forest models are evaluated and compared on the synthetic data. The paper shows how to successfully construct SymCat-based and NLICE-based datasets. We also show results for the effectiveness of using the datasets to train predictive disease models. The SymCat-based dataset is able to train a Naive Bayes and Random Forest model yielding a 58.8% and 57.1% Top-1 accuracy score, respectively. In contrast, the NLICE-based dataset improves the results, with a Top-1 accuracy of 82.0% and Top-5 accuracy values of more than 90% for both models. Our proposed data generation approach solves a major barrier to the application of artificial intelligence methods in the healthcare domain. Our novel NLICE symptom modeling approach addresses the incomplete and insufficient information problem in the current binary symptom representation approach.

Files

NLICE_Synthetic_Medical_Record... (pdf)

(pdf | 0.697 Mb)

- Embargo expired in 19-08-2024

License info not available