AS-ASR

A Lightweight and Real-Time Aphasia-Specific Auto Speech Recognition System

Master Thesis (2025)
Author(s)

C. Bao (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Gao – Mentor (TU Delft - Electronics)

Sijun Du – Graduation committee member (TU Delft - Electronic Instrumentation)

Qinwen Fan – Mentor (TU Delft - Electronic Components, Technology and Materials)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
21-07-2025
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Aphasia is a language disorder caused by brain damage. It often behaves in the form of disfluent speech, fragmented grammar, and irregular word usage. These characteristics make it difficult for current automatic speech recognition (ASR) systems to produce accurate transcriptions. While models like Whisper perform well on fluent speech, they often struggle with aphasic speech, especially in low-resource clinical settings. This thesis presents AS-ASR, an aphasia-specific ASR system built on the lightweight Whisper-tiny model and optimized for real-time use on edge devices. To improve transcription accuracy, we created a mixed dataset of fluent and aphasic speech and used the GPT-4 large language model (LLM) to clean and refine transcripts from disfluent recordings. We tested different ratios of aphasic to typical data during training to find a balance that supports both accuracy and generalization. In addition, we applied mixed-precision quantization and a simple energy-based voice activity detection method to reduce model size and inference time. The fine-tuned model achieved up to 40% lower word error rates on aphasic speech and also kept accurate performance on fluent speech. Our findings suggest that targeted fine-tuning and lightweight deployment strategies can make ASR systems more accessible and effective for clinical and assistive use.

Files

License info not available
warning

File under embargo until 01-08-2026