OS

138 records found

Authored

AnyoneNet

Synchronized Speech and Talking Head Generation for Arbitrary Persons

Automatically generating videos in which synthesized speech is synchronized with lip movements in a talking head has great potential in many human-computer interaction scenarios. In this paper, we present an automatic method to generate synchronized speech and talking-head vid ...

In this paper, we present the updated Audio-Visual Speech Recognition (AVSR) corpus of MISP2021 challenge, a large-scale audio-visual Chinese conversational corpus consisting of 141h audio and video data collected by far/middle/near microphones and far/middle cameras in 34 rea ...

There is ample evidence that recognising words in a non-native language is more difficult than in a native language, even for those with a high proficiency in the non-native language involved, and particularly in the presence of background noise. Why is this the case? To answe ...

Contributed

Classification of Covert Vowels in Spanish and Dutch

What do brain signals say about inner speech?

Patients with neuromuscular diseases that are unable to speak, but whose cognitive ability has been maintained, can be benefited from Brain Computer Interfaces (BCIs). The decoding of inner (covert) speech from EEGs consists of one of the state of the art methods that aim to tack ...

Perceptions of Artificial Social Agents

The cultural similarities and differences between Dutch and Chinese speakers in their perception of artificial social agents

Artificial social agents (ASAs) are systems designed to interact with humans in a socially intelligent manner. As the field of robotics is rapidly advancing, some studies focused on creating more effective agents by analysing how people perceive them. However, culture affects peo ...

Low complexity crosstalk cancellation algorithm for consumer audio systems

Optimizing crosstalk cancellation from a human sound perception perspective

Over the past decade, spatial audio awareness evolved into an in-demand feature in audio entertainment. The addition of sound source locations to, for instance, movies or music adds a level of auditory envelopment and spatial awareness to the audio experience. Expensive setups pr ...

Attention-based deep learning for DNA repair outcome prediction

Learning how the cell repairs DNA breaks using local sequence context

Recent advancements in quantification of repair outcomes of CRISPR-Cas9 mediated double-stranded DNA breaks (DSBs) have allowed for the use of machine learning for predicting the frequencies of these repair outcomes. Local DNA sequence context influences the frequencies of mutati ...

Decoding Covert Speech from EEG

Development of a novel database containing EEG and audio signals during Dutch covert and overt speech

To enable communication for patients who have lost the ability to speak due to severe neuromuscular diseases, covert speech based brain-computer interfaces (BCIs) might be used. These system use neural signals arising from covert speech and translate them into text or synthesised ...

Everyday Locations as Cues to Smoke

Personalized Environments in Virtual Reality to Elicit Smoking Cravings

Smoking is a leading risk factor negatively impacting the health of people, not only those partaking in it first-hand, but also to those around them. Different methods are available to assist people with quitting smoking, with various degrees of effectiveness. Researchers develop ...

For your voice only

Exploiting side channels in voice messaging for environment detection

Voice messages are an increasingly well-known method of communication, accounting for more than 200 million messages a day. Sending audio messages requires a user to invest lesser effort compared to texting while enhancing the meaning of the message by adding an emotional context ...

Security Evaluation of GoQuorum-based Smart Contracts

A Case Study of Malfunctioning Access Control and Double-Spending

GoQuorum is an enterprise blockchain platform that supports smart contracts and allows for private transactions. Smart contracts enable automated payment while eliminating the need for third-party involvement. Previous attacks on smart contracts have already shown that existing v ...

Phon Times

Improving Dutch phoneme recognition

In this research, Dutch phoneme recognition (PR) is researched and improved. The last research on Dutch PR dates back to 1995. This research presents Dutch PR in modern daylight by researching state-of-the-art techniques found in research on other languages and implementing them ...
Parkinson’s Disease (PD), Essential tremor (ET), and dystonia are movement disorders often misdiagnosed as one another and commonly present tremor as one of their motor symptoms. Rates of misdiagnosis between 30 and 50% of ET patients have been reported, where dystonia and PD are ...

Secure Proximity Detection and Verification

Addressing vulnerabilities in IEEE 802.15.4z UWB

We live in a world where much of our interactions with the environment around us depend on us being physically close to them. For instance, we have proximity­based tokens (e.g., keys and smartcards) for access systems installed at various places such as in cars, at contactless pa ...

Word recognition in a model of visually grounded speech

An analysis using techniques inspired by human speech processing research

A Visually Grounded Speech model is a neural model which is trained to embed image caption pairs closely together in a common embedding space. As a result, such a model can retrieve semantically related images given a speech caption and vice versa. The purpose of this research is ...

Evaluating Image2Speech

The evaluation of automatically generated phoneme captions for images

Image2Speech is the relatively new task of generating a spoken description of an image. Similar to Automatic Image Captioning, it is a task focused on describing images, however it avoids the usage of textual resources. An Image2Speech system produces a sequences of phonemes inst ...

Graph-Time Convolutional Neural Network

Learning from Time-Varying Signals defined on Graphs

Time-varying network data are essential in several real-world applications, such as temperature forecasting and earthquake classification. Spatial and temporal dependencies characterize these data and, therefore, conventional machine learning tools often fail to learn these joint ...

Project Delphi

What is Innovation?

Being a company with over 1500 employees, a lot of data is available about people and their day-to-day pursuits, including projects they are working on. Company X has requested to gain more insight into this data, as it is currently scattered over multiple systems. Specifically, ...

Retrieval-Based Open-Domain Question Answering

Exploring The Impact on The Retrieval Component across Datasets

Open-domain question answering (QA) is an important step in Artificial Intelligence and its ultimate goal is to build a QA system that can answer any question posed by humans. The majority of the open-domain QA system is the retrieval-based open-domain QA system, which enables th ...

Rotation invariant filters in CNNs

Applied to segmentation of aerial images for land-use classification

Convolutional neural networks are showing incredible performance in image classification, segmentation, object detection and other computer vision applications in recent years. But they lack understanding of affine transformations to input data. In this work, we introduce rotatio ...