BK
B. Koc
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Automatic speech recognition systems achieve near-human performance under standard conditions but perform poorly on dysarthric speech due to high acoustic variability resulting from neuromotor impairment. While speaker-specific adaptation can improve performance, limited training data restricts conventional learning approaches. Contrastive learning offers a promising alternative by encouraging more discriminative phoneme representations from limited data, but its effectiveness depends strongly on how negative examples are selected. This thesis investigates whether personalized contrastive learning can improve Dutch dysarthric phoneme recognition.
A Whisper-based encoder-DNN-CTC model is extended with a triplet-loss objective to improve phoneme-level discrimination. Four negative sampling strategies are compared: randomly selected, phonologically motivated, and two empirically derived from the model's own prediction errors, one estimated on the training set and one via cross-validation. Each is evaluated under two training regimes: contrastive fine-tuning of a pretrained model and training from scratch.
All contrastive approaches significantly outperform a CTC-only baseline. The strongest results are obtained with phonologically motivated and cross-validation-based empirical negatives when training from scratch, yielding up to a 10.7% relative reduction in phoneme error rate. Under fine-tuning, differences between sampling strategies are negligible. In contrast, when trained from scratch, the phonological and cross-validation-based empirical strategies significantly outperform randomly selected and training-set-based empirical negatives.
These findings suggest that, for this speaker, contrastive learning for dysarthric speech benefits from phonologically informed or empirically derived negative pairs rather than random selection. A practical trade-off emerges between the two strongest strategies: phonologically motivated sampling requires no speaker-specific preprocessing and is immediately applicable to new speakers, but generates a large number of triplets and is computationally expensive at training time. Cross-validation-based empirical sampling requires building a speaker-specific confusion matrix upfront, but produces fewer, more targeted triplets and trains more efficiently. Given comparable performance, the choice between them reduces to whether preprocessing overhead or training-time resources are the limiting constraint. ...
A Whisper-based encoder-DNN-CTC model is extended with a triplet-loss objective to improve phoneme-level discrimination. Four negative sampling strategies are compared: randomly selected, phonologically motivated, and two empirically derived from the model's own prediction errors, one estimated on the training set and one via cross-validation. Each is evaluated under two training regimes: contrastive fine-tuning of a pretrained model and training from scratch.
All contrastive approaches significantly outperform a CTC-only baseline. The strongest results are obtained with phonologically motivated and cross-validation-based empirical negatives when training from scratch, yielding up to a 10.7% relative reduction in phoneme error rate. Under fine-tuning, differences between sampling strategies are negligible. In contrast, when trained from scratch, the phonological and cross-validation-based empirical strategies significantly outperform randomly selected and training-set-based empirical negatives.
These findings suggest that, for this speaker, contrastive learning for dysarthric speech benefits from phonologically informed or empirically derived negative pairs rather than random selection. A practical trade-off emerges between the two strongest strategies: phonologically motivated sampling requires no speaker-specific preprocessing and is immediately applicable to new speakers, but generates a large number of triplets and is computationally expensive at training time. Cross-validation-based empirical sampling requires building a speaker-specific confusion matrix upfront, but produces fewer, more targeted triplets and trains more efficiently. Given comparable performance, the choice between them reduces to whether preprocessing overhead or training-time resources are the limiting constraint. ...
Automatic speech recognition systems achieve near-human performance under standard conditions but perform poorly on dysarthric speech due to high acoustic variability resulting from neuromotor impairment. While speaker-specific adaptation can improve performance, limited training data restricts conventional learning approaches. Contrastive learning offers a promising alternative by encouraging more discriminative phoneme representations from limited data, but its effectiveness depends strongly on how negative examples are selected. This thesis investigates whether personalized contrastive learning can improve Dutch dysarthric phoneme recognition.
A Whisper-based encoder-DNN-CTC model is extended with a triplet-loss objective to improve phoneme-level discrimination. Four negative sampling strategies are compared: randomly selected, phonologically motivated, and two empirically derived from the model's own prediction errors, one estimated on the training set and one via cross-validation. Each is evaluated under two training regimes: contrastive fine-tuning of a pretrained model and training from scratch.
All contrastive approaches significantly outperform a CTC-only baseline. The strongest results are obtained with phonologically motivated and cross-validation-based empirical negatives when training from scratch, yielding up to a 10.7% relative reduction in phoneme error rate. Under fine-tuning, differences between sampling strategies are negligible. In contrast, when trained from scratch, the phonological and cross-validation-based empirical strategies significantly outperform randomly selected and training-set-based empirical negatives.
These findings suggest that, for this speaker, contrastive learning for dysarthric speech benefits from phonologically informed or empirically derived negative pairs rather than random selection. A practical trade-off emerges between the two strongest strategies: phonologically motivated sampling requires no speaker-specific preprocessing and is immediately applicable to new speakers, but generates a large number of triplets and is computationally expensive at training time. Cross-validation-based empirical sampling requires building a speaker-specific confusion matrix upfront, but produces fewer, more targeted triplets and trains more efficiently. Given comparable performance, the choice between them reduces to whether preprocessing overhead or training-time resources are the limiting constraint.
A Whisper-based encoder-DNN-CTC model is extended with a triplet-loss objective to improve phoneme-level discrimination. Four negative sampling strategies are compared: randomly selected, phonologically motivated, and two empirically derived from the model's own prediction errors, one estimated on the training set and one via cross-validation. Each is evaluated under two training regimes: contrastive fine-tuning of a pretrained model and training from scratch.
All contrastive approaches significantly outperform a CTC-only baseline. The strongest results are obtained with phonologically motivated and cross-validation-based empirical negatives when training from scratch, yielding up to a 10.7% relative reduction in phoneme error rate. Under fine-tuning, differences between sampling strategies are negligible. In contrast, when trained from scratch, the phonological and cross-validation-based empirical strategies significantly outperform randomly selected and training-set-based empirical negatives.
These findings suggest that, for this speaker, contrastive learning for dysarthric speech benefits from phonologically informed or empirically derived negative pairs rather than random selection. A practical trade-off emerges between the two strongest strategies: phonologically motivated sampling requires no speaker-specific preprocessing and is immediately applicable to new speakers, but generates a large number of triplets and is computationally expensive at training time. Cross-validation-based empirical sampling requires building a speaker-specific confusion matrix upfront, but produces fewer, more targeted triplets and trains more efficiently. Given comparable performance, the choice between them reduces to whether preprocessing overhead or training-time resources are the limiting constraint.
Implications of LLMs4Code on Copyright Infringement
An Exploratory Study Through Red Teaming
Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering. Through the creation of a taxonomy and prompt engineering, we investigate how alignment, structure and language of prompts affect the behavior of LLMs against copyright infringing prompts, assessing their willingness to engage in copyright violation. Our findings underscore the critical role of model alignment in identifying potentially infringing inputs, irrespective of model complexity or modality. Notably, prompts that are crafted to avoid overtly malicious language, especially those that instruct the model to complete the input given, tend to yield more responses that could facilitate malicious activities. This research provides a preliminary understanding of copyright infringement by LLMs in software engineering and suggests avenues for future research.
...
Large Language Models (LLMs) have experienced a rapid increase in usage across numerous sectors in recent years. However, this growth brings a greater risk of misuse. This paper explores the issue of copyright infringement facilitated by LLMs in the domain of software engineering. Through the creation of a taxonomy and prompt engineering, we investigate how alignment, structure and language of prompts affect the behavior of LLMs against copyright infringing prompts, assessing their willingness to engage in copyright violation. Our findings underscore the critical role of model alignment in identifying potentially infringing inputs, irrespective of model complexity or modality. Notably, prompts that are crafted to avoid overtly malicious language, especially those that instruct the model to complete the input given, tend to yield more responses that could facilitate malicious activities. This research provides a preliminary understanding of copyright infringement by LLMs in software engineering and suggests avenues for future research.