Z. Yue
21 records found
1
Authored
State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied child-to-child voice conversion (VC) from existing child speakers in the dataset and additional (new) c
...
In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations wi ...
Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bi
...
Contributed
Dysarthric speech, characterized by articulation problems and a slower speech rate, shows lower automatic speech recognition (ASR) performance compared to normal speech. To improve performance, researchers often try to enhance dysarthric speech to be more like normal speech befor
...
Smoking and vaping cessation remains a significant public health challenge despite the availability of numerous aids and eHealth applications. This study explores the reasons behind users' preference for human feedback when preparing to quit smoking or vaping, aiming to address a
...
Evaluating Alternative Metrics for Dysarthric Speech Recognition
Assessing the Effectiveness of Different Evaluation Metrics in Dysarthric Speech Recognition Systems Across Various Severities
Dysarthria is a motor speech disorder resulting in slurred or slow speech that can be difficult to understand. This re- search paper evaluates the effectiveness of various metrics for automatic speech recognition (ASR), such as character error rate (CER), Jaro-Winkler distance, a
...
Background. Quitting smoking is a challenge nowadays. Virtual coaches offer autonomous, personalized guidance for smoking cessation. However, such systems cannot replace human coaches completely. In situations, when human coaches cannot provide help to everyone - a virtual coach
...
Smoking remains one of the largest health concerns worldwide, which is why eHealth applications with virtual coaches have been developed to assist smokers with quitting. Providing additional feedback from human coaches during such smoking cessation programs can further improve th
...
Reducing Bias in State-of-the-Art ASR Systems for Child Speech
Addressing Age and Gender Disparities through Transfer Learning Strategies
Automatic Speech Recognition (ASR) systems have transformed human-machine interaction, yet they often struggle with child speech due to the unique vocal characteristics. This thesis investigates age and gender biases, focusing on enhancing the performance of state-of-the-art ASR
...
Improving State-of-the-Art ASR Systems for Speakers with Dysarthria
Applying Low-Rank Adaptation Transfer Learning to Whisper
Dysarthria is a speech disorder that limits an individual’s ability to clearly articulate, due to the weakening of the muscles involved in speech. Despite recent advances in Automatic Speech Recognition (ASR), the recognition of dysarthric speech remains a significant challenge b
...
Automatic Dysarthria Severity Assessment using Whisper-extracted Features
Evaluating ML architectures for dysarthria severity assessment on TORGO and MSDM
Dysarthria is a speech disorder commonly caused by neurological disorders such as strokes, cerebral palsy and Amyotrophic Lateral Sclerosis (ALS). The severity level of dysarthria greatly influences the appropriate treatment for a patient. However, assessing the severity of dysar
...
How Does OpenAI’s Whisper Interpret Dysarthric Speech?
An Analysis of Acoustic Feature Probing and Representation Layers for Dysarthic Speech
This paper investigates how OpenAI’s Whisper model processes dysarthric speech by probing its internal acoustic feature representations. Utilizing the TORGO database, we analyzed Whisper’s capability to encode significant acoustic features specific to dysarthric speech across its
...
Deep learning models are now widely deployed on edge IoT devices. However, most of these models are trained under supervised conditions and can only recognize seen classes learned from the training stage. Zero-shot learning (ZSL) is a popular method for identifying unseen classes
...
Empirical Investigation of Learning Curves
Assessing Convexity Characteristics
Nonconvexity in learning curves is almost always undesirable. A machine learning model with a non-convex learning curve either requires a larger quantity of data to observe progress in its accuracy or experiences an exponential decrease of accuracy at low sample sizes, with no im
...
Learning curves in machine learning are graphical representations that depict the relationship between a model's performance and the amount of training data it has been exposed to. They play a fundamental role in obtaining the knowledge and skills across a range of domains. Altho
...
Watermarks are historical motifs present in the texture of paper that are commonly used to identify the paper manufacturers. They only become visible when viewed under certain light conditions. Under ideal circumstances, researchers may use watermarks to determine a historical do
...
”How Much Data is Enough?” Learning curves for machine learning
Investigating alternatives to the Levenberg-Marquardt algorithm for learning curve extrapolation
The conducted research explores fitting algorithms for learning curves. Learning curves describe how the performance of a machine learning model changes with the size of the training input. Therefore, fitting these learning curves and extrapolating them can help determine the req
...
Non-Monotonicity in Empirical Learning Curves
Identifying non-monotonicity through slope approximations on discrete points
Learning curves are used to shape the performance of a Machine Learning (ML) model with respect to the size of the set used for training it. It was commonly thought that adding more training samples would increase the model's accuracy (i.e., they are monotone), but recent works s
...
A Comparative Analysis of Learning Curve Models and their Applicability in Different Scenarios
Finding datasets patterns which lead to certain parametric curve model
Learning curves display predictions of the chosen model’s performance for different training set sizes. They can help estimate the amount of data required to achieve a minimal error rate, thus aiding in reducing the cost of data collection. However, our understanding and knowledg
...
Targeted and successful cellular therapies for disease treatment require an extensive mapping of the complex structure and dynamics of molecular mechanisms which determine the behaviour and function of cell. CELL-seq is a genome-wide screening procedure measuring specific and tar
...