DM

Danny Merkx

Authored

2 records found

We investigated word recognition in a Visually Grounded Speech model. The model has been trained on pairs of images and spoken captions to create visually grounded embeddings which can be used for speech to image retrieval and vice versa. We investigate whether such a model can b ...
Fine-Tracker is a speech-based model of human speech recognition. While previous work has shown that Fine-Tracker is successful at modelling aspects of human spoken-word recognition, its speech recognition performance is not comparable to that of human performance, possibly due t ...

Contributed

1 records found

Word recognition in a model of visually grounded speech

An analysis using techniques inspired by human speech processing research

A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is ...