TT

Authored

1 records found

Visually grounded speech representation learning has shown to be useful in the field of speech representation learning. Studies of learning visually grounded speech embedding adopted speech-image cross-modal retrieval task to evaluate the models, since the cross-modal retrieval t ...