Aphasia is a language disorder caused by brain damage. It often behaves in the form of disfluent speech, fragmented grammar, and irregular word usage. These characteristics make it difficult for current automatic speech recognition (ASR) systems to produce accurate transcriptions
...
Aphasia is a language disorder caused by brain damage. It often behaves in the form of disfluent speech, fragmented grammar, and irregular word usage. These characteristics make it difficult for current automatic speech recognition (ASR) systems to produce accurate transcriptions. While models like Whisper perform well on fluent speech, they often struggle with aphasic speech, especially in low-resource clinical settings. This thesis presents AS-ASR, an aphasia-specific ASR system built on the lightweight Whisper-tiny model and optimized for real-time use on edge devices. To improve transcription accuracy, we created a mixed dataset of fluent and aphasic speech and used the GPT-4 large language model (LLM) to clean and refine transcripts from disfluent recordings. We tested different ratios of aphasic to typical data during training to find a balance that supports both accuracy and generalization. In addition, we applied mixed-precision quantization and a simple energy-based voice activity detection method to reduce model size and inference time. The fine-tuned model achieved up to 40% lower word error rates on aphasic speech and also kept accurate performance on fluent speech. Our findings suggest that targeted fine-tuning and lightweight deployment strategies can make ASR systems more accessible and effective for clinical and assistive use.