- document
-
Wang, X. (author), Qiao, T. (author), Zhu, Jihua (author), Hanjalic, A. (author), Scharenborg, O.E. (author)An estimated half of the world’s languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies. In this paper, a speech-to-image generation (S2IG) framework is proposed which translates speech descriptions to photo-realistic images without using any text information, thus...conference paper 2020
- document
-
Wang, Bokun (author), Yang, Yang (author), Xing, Xu (author), Hanjalic, A. (author), Shen, Heng Tao (author)Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g., texts vs. images). The core of crossmodal retrieval research is to learn a common subspace where the items of different modalities can be directly compared to each other. In this paper, we present a novel Adversarial Cross-Modal Retrieval ...conference paper 2017