Incorporating Word-level Phonemic Decoding into Readability Assessment

Conference Paper (2024)
Author(s)

Christine Pinney (Boise State University)

Casey Kennington (Boise State University)

Katherine Landau Wright (Boise State University)

Maria Soledad Pera (TU Delft - Web Information Systems)

Jerry Alan Fails (Boise State University)

Research Group
Web Information Systems
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Web Information Systems
Pages (from-to)
8998-9009
ISBN (electronic)
9782493814104
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Current approaches in automatic readability assessment have found success with the use of large language models and transformer architectures. These techniques lead to accuracy improvement, but they do not offer the interpretability that is uniquely required by the audience most often employing readability assessment tools: teachers and educators. Recent work that employs more traditional machine learning methods has highlighted the linguistic importance of considering semantic and syntactic characteristics of text in readability assessment by utilizing handcrafted feature sets. Research in Education suggests that, in addition to semantics and syntax, phonetic and orthographic instruction are necessary for children to progress through the stages of reading and spelling development; children must first learn to decode the letters and symbols on a page to recognize words and phonemes and their connection to speech sounds. Here, we incorporate this word-level phonemic decoding process into readability assessment by crafting a phonetically-based feature set for grade-level classification for English. Our resulting feature set shows comparable performance to much larger, semantically- and syntactically-based feature sets, supporting the linguistic value of orthographic and phonetic considerations in readability assessment.