S
Shen
10 records found
1
Joint Feature Synthesis and Embedding
Adversarial Cross-Modal Retrieval Revisited
Recently, generative adversarial network (GAN) has shown its strong ability on modeling data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the power of GAN to model the cross-modal joint distribution and to learn compatible cross-modal features
...
Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into a common space to calcula
...
Binary codes have often been deployed to facilitate large-scale retrieval tasks, but not that often for image compression. In this paper, we propose a unified framework, BGAN+, that restricts the input noise variable of generative adversarial networks to be binary and conditioned
...
We present a scheme to entangle two microwave fields by using the nonlinear magnetostrictive interaction in a ferrimagnet. The magnetostrictive interaction enables the coupling between a magnon mode (spin wave) and a mechanical mode in the ferrimagnet, and the magnon mode simulta
...
In this article, we address the problem of visual question generation (VQG), a challenge in which a computer is required to generate meaningful questions about an image targeting a given answer. The existing approaches typically treat the VQG task as a reversed visual question an
...
A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding
...
In this paper, we propose a novel approach to video captioning based on adversarial learning and long short-term memory (LSTM). With this solution concept, we aim at compensating for the deficiencies of LSTM-based video captioning methods that generally show potential to effectiv
...
The most striking successes in image retrieval using deep hashing have mostly involved discriminative models, which require labels. In this paper, we use binary generative adversarial networks (BGAN) to embed images to binary codes in an unsupervised way. By restricting the input
...
From Deterministic to Generative
Multimodal Stochastic RNNs for Video Captioning
Video captioning, in essential, is a complex natural process, which is affected by various uncertainties stemming from video content, subjective judgment, and so on. In this paper, we build on the recent progress in using encoder-decoder framework for video captioning and address
...
Cross-modal retrieval aims to enable flexible retrieval experience across different modalities (e.g., texts vs. images). The core of crossmodal retrieval research is to learn a common subspace where the items of different modalities can be directly compared to each other. In this
...