LearningQ: A Large-scale Dataset for Educational Question Generation

Conference Paper (2018)
Author(s)

G. Chen (TU Delft - Web Information Systems)

Jie Yang (University of Fribourg)

C Hauff (TU Delft - Web Information Systems)

Geert-Jan Houben (TU Delft - Web Information Systems)

Research Group
Web Information Systems
Copyright
© 2018 G. Chen, J. Yang, C. Hauff, G.J.P.M. Houben
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 G. Chen, J. Yang, C. Hauff, G.J.P.M. Houben
Research Group
Web Information Systems
Pages (from-to)
481-490
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We present LearningQ, a challenging educational question generation dataset containing over 230K document-question pairs. It includes 7K instructor-designed questions assessing knowledge concepts being taught and 223K learner-generated questions seeking in-depth understanding of the taught concepts. We show that, compared to existing datasets that can be used to generate educational questions, LearningQ (i) covers a wide range of educational topics and (ii) contains long and cognitively demanding documents for which question generation requires reasoning over the relationships between sentences and paragraphs. As a result, a significant percentage of LearningQ questions (~30%) require higher-order cognitive skills to solve (such as applying, analyzing), in contrast to existing question-generation datasets that are designed mostly for the lowest cognitive skill level (i.e. remembering). To understand the effectiveness of existing question generation methods in producing educational questions, we evaluate both rule-based and deep neural network based methods on LearningQ. Extensive experiments show that state-of-the-art methods which perform well on existing datasets cannot generate useful educational questions. This implies that LearningQ is a challenging test bed for the generation of high-quality educational questions and worth further investigation. We open-source the dataset and our codes at https://dataverse.mpi-sws.org/dataverse/icwsm18.

Files

17857_77947_1_PB.pdf
(pdf | 0.579 Mb)
License info not available