AS-ASR

None, None

AS-ASR

A Lightweight and Real-Time Aphasia-Specific Auto Speech Recognition System

Master Thesis (2025)

Author(s)

C. Bao (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Gao – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

S. Du – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Q. Fan – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use

https://resolver.tudelft.nl/uuid:6f73843e-fd97-4092-a42a-5cdb649d6407

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

21-07-2025

Awarding Institution

Delft University of Technology

Programme

Aerospace Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

91

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Aphasia is a language disorder caused by brain damage. It often behaves in the form of disfluent speech, fragmented grammar, and irregular word usage. These characteristics make it difficult for current automatic speech recognition (ASR) systems to produce accurate transcriptions. While models like Whisper perform well on fluent speech, they often struggle with aphasic speech, especially in low-resource clinical settings. This thesis presents AS-ASR, an aphasia-specific ASR system built on the lightweight Whisper-tiny model and optimized for real-time use on edge devices. To improve transcription accuracy, we created a mixed dataset of fluent and aphasic speech and used the GPT-4 large language model (LLM) to clean and refine transcripts from disfluent recordings. We tested different ratios of aphasic to typical data during training to find a balance that supports both accuracy and generalization. In addition, we applied mixed-precision quantization and a simple energy-based voice activity detection method to reduce model size and inference time. The fine-tuned model achieved up to 40% lower word error rates on aphasic speech and also kept accurate performance on fluent speech. Our findings suggest that targeted fine-tuning and lightweight deployment strategies can make ASR systems more accessible and effective for clinical and assistive use.

Files

Thesis_Report_Chen_Bao.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 01-08-2026