ZY

Authored

2 records found

Automatic speech recognition (ASR) should serve every speaker, not only the majority “standard” speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in a “non-standard” or “diverse” way is crucial. We aim to mitigate the bi ...
In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with ...

Contributed

18 records found

”How Much Data is Enough?” Learning curves for machine learning

Investigating alternatives to the Levenberg-Marquardt algorithm for learning curve extrapolation

The conducted research explores fitting algorithms for learning curves. Learning curves describe how the performance of a machine learning model changes with the size of the training input. Therefore, fitting these learning curves and extrapolating them can help determine the req ...

Non-Monotonicity in Empirical Learning Curves

Identifying non-monotonicity through slope approximations on discrete points

Learning curves are used to shape the performance of a Machine Learning (ML) model with respect to the size of the set used for training it. It was commonly thought that adding more training samples would increase the model's accuracy (i.e., they are monotone), but recent works s ...

Empirical Investigation of Learning Curves

Assessing Convexity Characteristics

Nonconvexity in learning curves is almost always undesirable. A machine learning model with a non-convex learning curve either requires a larger quantity of data to observe progress in its accuracy or experiences an exponential decrease of accuracy at low sample sizes, with no im ...

A Comparative Analysis of Learning Curve Models and their Applicability in Different Scenarios

Finding datasets patterns which lead to certain parametric curve model

Learning curves display predictions of the chosen model’s performance for different training set sizes. They can help estimate the amount of data required to achieve a minimal error rate, thus aiding in reducing the cost of data collection. However, our understanding and knowledg ...

How Does OpenAI’s Whisper Interpret Dysarthric Speech?

An Analysis of Acoustic Feature Probing and Representation Layers for Dysarthic Speech

This paper investigates how OpenAI’s Whisper model processes dysarthric speech by probing its internal acoustic feature representations. Utilizing the TORGO database, we analyzed Whisper’s capability to encode significant acoustic features specific to dysarthric speech across its ...

Improving State-of-the-Art ASR Systems for Speakers with Dysarthria

Applying Low-Rank Adaptation Transfer Learning to Whisper

Dysarthria is a speech disorder that limits an individual’s ability to clearly articulate, due to the weakening of the muscles involved in speech. Despite recent advances in Automatic Speech Recognition (ASR), the recognition of dysarthric speech remains a significant challenge b ...

Improving State-of-the-Art ASR Systems for Speakers with Dysarthria

Applying Low-Rank Adaptation Transfer Learning to Whisper

Dysarthria is a speech disorder that limits an individual’s ability to clearly articulate, due to the weakening of the muscles involved in speech. Despite recent advances in Automatic Speech Recognition (ASR), the recognition of dysarthric speech remains a significant challenge b ...

Reducing Bias in State-of-the-Art ASR Systems for Child Speech

Addressing Age and Gender Disparities through Transfer Learning Strategies

Automatic Speech Recognition (ASR) systems have transformed human-machine interaction, yet they often struggle with child speech due to the unique vocal characteristics. This thesis investigates age and gender biases, focusing on enhancing the performance of state-of-the-art ASR ...

Reducing Bias in State-of-the-Art ASR Systems for Child Speech

Addressing Age and Gender Disparities through Transfer Learning Strategies

Automatic Speech Recognition (ASR) systems have transformed human-machine interaction, yet they often struggle with child speech due to the unique vocal characteristics. This thesis investigates age and gender biases, focusing on enhancing the performance of state-of-the-art ASR ...

Automatic Dysarthria Severity Assessment using Whisper-extracted Features

Evaluating ML architectures for dysarthria severity assessment on TORGO and MSDM

Dysarthria is a speech disorder commonly caused by neurological disorders such as strokes, cerebral palsy and Amyotrophic Lateral Sclerosis (ALS). The severity level of dysarthria greatly influences the appropriate treatment for a patient. However, assessing the severity of dysar ...

Automatic Dysarthria Severity Assessment using Whisper-extracted Features

Evaluating ML architectures for dysarthria severity assessment on TORGO and MSDM

Dysarthria is a speech disorder commonly caused by neurological disorders such as strokes, cerebral palsy and Amyotrophic Lateral Sclerosis (ALS). The severity level of dysarthria greatly influences the appropriate treatment for a patient. However, assessing the severity of dysar ...

Evaluating Alternative Metrics for Dysarthric Speech Recognition

Assessing the Effectiveness of Different Evaluation Metrics in Dysarthric Speech Recognition Systems Across Various Severities

Dysarthria is a motor speech disorder resulting in slurred or slow speech that can be difficult to understand. This re- search paper evaluates the effectiveness of various metrics for automatic speech recognition (ASR), such as character error rate (CER), Jaro-Winkler distance, a ...

Evaluating Alternative Metrics for Dysarthric Speech Recognition

Assessing the Effectiveness of Different Evaluation Metrics in Dysarthric Speech Recognition Systems Across Various Severities

Dysarthria is a motor speech disorder resulting in slurred or slow speech that can be difficult to understand. This re- search paper evaluates the effectiveness of various metrics for automatic speech recognition (ASR), such as character error rate (CER), Jaro-Winkler distance, a ...
Learning curves in machine learning are graphical representations that depict the relationship between a model's performance and the amount of training data it has been exposed to. They play a fundamental role in obtaining the knowledge and skills across a range of domains. Altho ...
Targeted and successful cellular therapies for disease treatment require an extensive mapping of the complex structure and dynamics of molecular mechanisms which determine the behaviour and function of cell. CELL-seq is a genome-wide screening procedure measuring specific and tar ...
Watermarks are historical motifs present in the texture of paper that are commonly used to identify the paper manufacturers. They only become visible when viewed under certain light conditions. Under ideal circumstances, researchers may use watermarks to determine a historical do ...
Watermarks are historical motifs present in the texture of paper that are commonly used to identify the paper manufacturers. They only become visible when viewed under certain light conditions. Under ideal circumstances, researchers may use watermarks to determine a historical do ...
Background. Quitting smoking is a challenge nowadays. Virtual coaches offer autonomous, personalized guidance for smoking cessation. However, such systems cannot replace human coaches completely. In situations, when human coaches cannot provide help to everyone - a virtual coach ...