Towards Robust Automatic Question Generation For Learning

Zhu, P.

doi:10.4233/uuid:4e23fb2f-6539-44b7-bab2-6c6b2fd7ce8d

Towards Robust Automatic Question Generation For Learning

Title

Towards Robust Automatic Question Generation For Learning

Author

Zhu, P. (TU Delft Web Information Systems)

Contributor

Houben, G.J.P.M. (promotor)
Hauff, C. (promotor)

Degree granting institution

Delft University of Technology

Date

2024-04-08

Abstract

Questions are critical for information-seeking and learning. Automatic Question Generation (AQG) involves the subjects of Information Retrieval (IR) and Natural Language Processing (NLP), and focuses on automatically creating questions for various applications, subjects which have been studied for decades. In this thesis, we study how to create a robust automatic question generation system from several aspects, including data creation, evaluation, and effects of question generation.

First, we contribute to the quality evaluation of the generated questions. Specifically, we introduce three new evaluation metrics and compare the effects of applying the automatic evaluation metrics as rewards for reinforcement learning-based question generation system training. Question quality evaluation is an essential part of AQG systems. It is further used in this thesis in dataset creation, question selection for self-training, and filtering automatically generated questions shown for learners.

Data are essential for building AQG systems. In Chapters 3 and 4, we focus on data quality control in two main methods of dataset creation: collecting user-generated resources from online platforms and from crowdsourcing. Specifically, we start by investigating the information overload issue in MOOC forum discussions caused by unuseful, unlabeled, and unstructured data. We propose a framework for clip recommendation that includes useful question classification and a neural ranker. We further investigate training the neural ranker with both labeled and weakly labeled data. We then study how to infer the true answer span from multiple crowdsourced annotations automatically. We propose an approach to effectively utilize the quality of each answer annotation and its relation to other answer annotations for answer aggregation. Despite the various available methods of collecting labeled data, there are many application domains where the labeled data is hard or expensive to harvest. In Chapter 5, we move to automatically adapting the AQG model trained on label-data-abundant domains to strange domains with few labeled data.

With the impressive advantages of automatic question generation methods, it is critical to understand how the generated questions on humans. Finally, in Chapter 6, we turn to study the effects of automatically generated questions on the learners’ behaviours and learning outcomes when they serve as the adjunct questions in the informal search as learning scenario. We conduct an extensive user study to shed light on this topic.

Subject

question generation
domain adaptation
adjunct questions

To reference this document use:

https://doi.org/10.4233/uuid:4e23fb2f-6539-44b7-bab2-6c6b2fd7ce8d

Series

SIKS Dissertation Series (2024-12)

Part of collection

Institutional Repository

Document type

doctoral thesis

Rights

Files

PDF

Thesis_Peide_Full.pdf

5.5 MB

Close viewer