Careful Generation
An Exploration of Open-Source Large Language Model Support for Advance Care Planning in Paediatric Palliative Care
A.E. Vernhout (TU Delft - Mechanical Engineering)
Carine van Capelle – Mentor (Erasmus MC)
Megha Khosla – Mentor (TU Delft - Multimedia Computing)
Liselotte Mahieu – Mentor (Erasmus MC)
F.J.H. Gijsen – Graduation committee member (TU Delft - Medical Instruments & Bio-Inspired Technology)
Agnes van der Heide – Graduation committee member (Erasmus MC)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Introduction
Paediatric palliative care (PPC) aims to optimise the quality of life of children with life-limiting or life-threatening conditions by addressing physical, psychosocial, emotional and spiritual needs of children as well as their family members. Advance care planning (ACP) is a central element of PPC, as it helps children and family members formulate values, needs, and goals for future care. However, ACP documentation is time-consuming and burdensome for healthcare professionals (HCPs). Large Language Models (LLMs) may support this process by automatically extracting and structuring ACP outcomes. This study explored the support of open-source LLMs summarising ACP outcomes from Individual Care Plans (ICPs) in a Dutch PPC setting.
Methods
We constructed a pseudonymised dataset of 38 ICPs, with reference ACP summaries structured around three guiding questions: (1) Who are you?, (2) What is important to you?, and (3) What are your goals and wishes for future care and treatment? Two open-source decoder-only LLMs were selected: Llama-3.1-8B-instruct (Llama-3.1) and Fietje-2-instruct (Fietje-2). We evaluated their performance under zero-shot prompting, in-context learning (ICL) with up to eight examples, and QLoRA fine-tuning on 30 training samples. Outputs were assessed with automatic metrics (BLEU, ROUGE-L, BERTScore, MEDCON), complemented by textual analysis and a human reader study.
Results