Evaluating Neural Text Simplification in the Medical Domain

More Info
expand_more

Abstract

Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled health professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. In this thesis we evaluate the state-of-the-art in automatic text simplification in the medical domain. We train a Neural Machine Translation (NMT) system on aligned complex and simple sentences from Wikipedia and Simple Wikipedia. As there are no publicly available aligned medical text simplification corpora, we create one semi-automatically with the help of a domain expert and one fully automatically using a novel monolingual alignment method introduced in this thesis. We analyse the effect of in-domain data when training an NMT system. Furthermore, we describe two strategies for medical term simplification in combination with NMT: 1) An extra pre-processing step that boosts medical term simplification 2) A post-processing dictionary approach using the Open-Access and Collaborative Consumer Health Vocabulary (CHV). We analyse the effect of both strategies separately. We let humans evaluate the output on grammar, meaning preservation (from the complex sentence) and simplicity (compared to the complex sentence).

Results show that an NMT trained on general aligned complex and simple sentences is able to simplify medical sentences at the level of Simple Wikipedia. An NMT trained on medical sentences (in addition to general sentences) in combination with the boosting strategy for medical term simplification is able to translate more medical concepts, but the output is not simpler than the NMT trained on general sentences only. Interestingly, NMT in combination with the CHV did not boost simplicity, but had the opposite effect.