Search results | TU Delft Repositories

document

Bayesian oma of offshore rock lighthouses: Surprises with close modes, symmetry and alignment

Brownjohn, James (author), Raby, Alison (author), Au, Siu Kui (author), Zhu, Zuo (author), Wang, Xinrui (author), Antonini, A. (author)

A set of seven rock lighthouses around the British Isles was studied by a combination of forced and ambient vibration tests executed with some extreme logistical constraints. Forced vibration testing of the circular section masonry towers combined with experimental modal analysis identified modes with alignment assumed the same as the shaker...

conference paper 2019

document

S2IGAN: Speech-to-Image Generation via Adversarial Learning

Wang, X. (author), Qiao, T. (author), Zhu, Jihua (author), Hanjalic, A. (author), Scharenborg, O.E. (author)

An estimated half of the world’s languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies. In this paper, a speech-to-image generation (S2IG) framework is proposed which translates speech descriptions to photo-realistic images without using any text information, thus...

conference paper 2020

document

Reinforcement Learning in Railway Timetable Rescheduling

Zhu, Y. (author), Wang, H. (author), Goverde, R.M.P. (author)

Real-time railway traffic management is important for the daily operations of railway systems. It predicts and resolves operational conflicts caused by events like excessive passenger boardings/alightings. Traditional optimization methods for this problem are restricted by the size of the problem instances. Therefore, this paper proposes a...

conference paper 2020

document

Show and speak: Directly synthesize spoken description of images

Wang, X. (author), Feng, S. (author), Zhu, Jihua (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author)

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes. The basic structure of SAS is an encoder-decoder architecture that takes an image as input and predicts the spectrogram of speech that...

conference paper 2021

document

Learning fine-grained semantics in spoken language using visual grounding

Wang, X. (author), Tian, Tian (author), Zhu, Jihua (author), Scharenborg, O.E. (author)

In the case of unwritten languages, acoustic models cannot be trained in the standard way, i.e., using speech and textual transcriptions. Recently, several methods have been proposed to learn speech representations using images, i.e., using visual grounding. Existing studies have focused on scene images. Here, we investigate whether fine...

conference paper 2021

document

Answer Quality Aware Aggregation for Extractive QA Crowdsourcing

Zhu, P. (author), Wang, Z. (author), Yang, J. (author), Hauff, C. (author), Anand, A. (author)

Quality control is essential for creating extractive question answering (EQA) datasets via crowdsourcing. Aggregation across answers, i.e. word spans within passages annotated, by different crowd workers is one major focus for ensuring its quality. However, crowd workers cannot reach a consensus on a considerable portion of questions. We...

conference paper 2022

document

WordMarkov: A New Password Probability Model of Semantics

Xie, Jiahong (author), Cheng, Haibo (author), Zhu, Rong (author), Wang, Ping (author), Liang, K. (author)

To date there are few researches on the semantic information of passwords, which leaves a gap preventing us from fully understanding the passwords characteristic and security. We propose a new password probability model for semantic information based on Markov Chain with both generalization and accuracy, called WordMarkov, that can capture the...

conference paper 2022

document

FedNaWi: Selecting the Befitting Clients for Robust Federated Learning in IoT Applications

Zhu, R. (author), Yang, M. (author), Yang, J. (author), Wang, Q. (author)

Federated Learning (FL) is an important privacy-preserving learning paradigm that is expected to play an essential role in the future Intelligent Internet of Things (IoT). However, model training in FL is vulnerable to noise and the statistical heterogeneity of local data across IoT clients. In this paper, we propose FedNaWi, a “Go Narrow, Then...

conference paper 2023

document

Towards Cross-Modal Point Cloud Retrieval for Indoor Scenes

Yu, Fuyang (author), Wang, Zhen (author), Li, Dongyuan (author), Zhu, P. (author), Liang, Xiaohui (author), Wang, Xiaochuan (author), Okumura, Manabu (author)

Cross-modal retrieval, as an important emerging foundational information retrieval task, benefits from recent advances in multimodal technologies. However, current cross-modal retrieval methods mainly focus on the interaction between textual information and 2D images, lacking research on 3D data, especially point clouds at scene level,...

conference paper 2024

document

MRHF: Multi-stage Retrieval and Hierarchical Fusion for Textbook Question Answering

Zhu, P. (author), Wang, Zhen (author), Okumura, Manabu (author), Yang, J. (author)

Textbook question answering is challenging as it aims to automatically answer various questions on textbook lessons with long text and complex diagrams, requiring reasoning across modalities. In this work, we propose MRHF, a novel framework that incorporates dense passage re-ranking and the mixture-of-experts architecture for TQA. MRHF...

conference paper 2024