Unsupervised Domain Adaptation for Question Generation with Domain Data Selection and Self-training

Conference Paper (2022)
Author(s)

P. Zhu (TU Delft - Web Information Systems)

C. Hauff (TU Delft - Web Information Systems)

Research Group
Web Information Systems
Copyright
© 2022 P. Zhu, C. Hauff
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 P. Zhu, C. Hauff
Research Group
Web Information Systems
Pages (from-to)
2388-2401
ISBN (electronic)
9781955917766
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Question generation (QG) approaches based on large neural models require (i) large-scale and (ii) high-quality training data. These two requirements pose difficulties for specific application domains where training data is expensive and difficult to obtain. The trained QG models' effectiveness can degrade significantly when they are applied on a different domain due to domain shift. In this paper, we explore an unsupervised domain adaptation approach to combat the lack of training data and domain shift issue with domain data selection and self-training. We first present a novel answer-aware strategy for domain data selection to select data with the most similarity to a new domain. The selected data are then used as pseudo in-domain data to retrain the QG model. We then present generation confidenceguided self-training with two generation confidence modeling methods: (i) generated questions' perplexity and (ii) the fluency score. We test our approaches on three large public datasets with different domain similarities, using a transformer-based pre-trained QG model. The results show that our proposed approaches outperform the baselines, and show the viability of unsupervised domain adaptation with answer-aware data selection and self-training on the QG task. The code is available at https://github.com/zpeide/transfer_qg.

Files

2022.findings_naacl.183.pdf
(pdf | 0.982 Mb)
License info not available