GL

G. Lan

info

Please Note

15 records found

A Multi-faceted Eye Tracking Dataset for Emotion Recognition in Virtual Reality

Journal article (2025) - Tongyun Yang, Bishwas Regmi, Lingyu Du, Andreas Bulling, Xucong Zhang, Guohao Lan
Virtual Reality (VR) is transforming cognitive and psychological research by enabling immersive simulations that elicit authentic emotional responses. The high demand for VR-based emotion recognition is also evident in fields such as mental healthcare, education, and entertainment, where understanding users' emotional states can enhance user experience and system effectiveness. However, the lack of comprehensive datasets hinders progress in VR-based emotion recognition. In this paper, we present a comprehensive, multi-faceted eye-tracking dataset collected from 26 participants using 28 emotional video stimuli rendered in a custom virtual environment. Our dataset is the first to incorporate high-frame-rate periocular videos, capturing subtle motions, such as micro-expressions and eyebrow shifts, which are critical for emotion analysis. Additionally, it includes high-frequency eye-tracking data, offering gaze direction and pupil dynamics at four times the frequency of existing datasets. Our dataset is also unique in providing emotion annotations according to Ekman's emotion model and, as such, offering experiments impossible using existing datasets. Our benchmark evaluations show that fusing the multi-faceted eye-tracking signals in our dataset significantly improves emotion recognition accuracy. As such, our work has the potential to significantly accelerate and enable entirely new research on emotion-aware VR applications. ...
Conference paper (2025) - L. Du, Yupei Liu, Jinyuan Jia, G. Lan
Gaze estimation models are widely used in applications such as driver attention monitoring and human-computer interaction. While many methods for gaze estimation exist, they rely heavily on data-hungry deep learning to achieve high performance. This reliance often forces practitioners to harvest training data from unverified public datasets, outsource model training, or rely on pre-trained models. However, such practices expose gaze estimation models to backdoor attacks. In such attacks, adversaries inject backdoor triggers by poisoning the training data, creating a backdoor vulnerability: the model performs normally with benign inputs, but produces manipulated gaze directions when a specific trigger is present. This compromises the security of many gaze-based applications, such as causing the model to fail in tracking the driver's attention. To date, there is no defense that addresses backdoor attacks on gaze estimation models. In response, we introduce SecureGaze, the first solution designed to protect gaze estimation models from such attacks. Unlike classification models, defending gaze estimation poses unique challenges due to its continuous output space and globally activated backdoor behavior. By identifying distinctive characteristics of backdoored gaze estimation models, we develop a novel and effective approach to reverse-engineer the trigger function for reliable backdoor detection. Extensive evaluations in both digital and physical worlds demonstrate that SecureGaze effectively counters a range of backdoor attacks and outperforms seven state-of-the-art defenses adapted from classification models. ...

Online Pose Error Estimation System for Visual SLAM

Conference paper (2024) - Tianyi Hu, Tim Scargill, Fan Yang, Ying Chen, Guohao Lan, Maria Gorlatova
In this work, we introduce SEESys, the first system to provide online pose error estimation for Simultaneous Localization and Mapping (SLAM). Unlike prior offline error estimation approaches, the SEESys framework efficiently collects real-time system features and delivers accurate pose error magnitude estimates with low latency. This enables real-time quality-of-service information for downstream applications. To achieve this goal, we develop a SLAM system run-time status monitor (RTS monitor) that performs feature collection with minimal overhead, along with a multi-modality attention-based Deep SLAM Error Estimator (DeepSEE) for error estimation. We train and evaluate SEESys using both public SLAM benchmarks and a diverse set of synthetic datasets, achieving an RMSE of 0.235 cm of pose error estimation, which is 15.8% lower than the baseline. Additionally, we conduct a case study showcasing SEESys in a real-world scenario, where it is applied to a real-time audio error advisory system for human operators of a SLAM-enabled device. The results demonstrate that SEESys provides error estimates with an average end-to-end latency of 37.3 ms, and the audio error advisory reduces pose tracking error by 25%. ...
Conference paper (2024) - Tao Ni, Zehua Sun, Mingda Han, Guohao Lan, Yaxiong Xie, Zhenjiang Li, Tao Gu, Weitao Xu
Diverse Wi-Fi-based wireless applications have been proposed, ranging from daily activity recognition to vital sign monitoring. Despite their remarkable sensing accuracy, the high energy consumption and the requirement for customized hardware modification hinder the wide deployment of the existing sensing solutions. In this paper, we propose REHSense, an energy-efficient wireless sensing solution based on Radio-Frequency (RF) energy harvesting. Instead of relying on a power-hungry Wi-Fi receiver, REHSense leverages an RF energy harvester as the sensor and utilizes the voltage signals harvested from the ambient Wi-Fi signals to enable simultaneous context sensing and energy harvesting. We design and implement REHSense using a commercial-off-the-shelf (COTS) RF energy harvester. Extensive evaluation of three fine-grained wireless sensing tasks (i.e., respiration monitoring, human activity recognition, and hand gesture recognition) shows that REHSense can achieve comparable sensing accuracy with conventional Wi-Fi-based solutions while adapting to different sensing environments, reducing the power consumption of sensing by 98.7% and harvesting up to 4.5 mW of power from RF energy. ...

Preserving User Privacy in Black-box Mobile Gaze Tracking Services

Journal article (2024) - Lingyu Du, Jinyuan Jia, Xucong Zhang, Guohao Lan
Eye gaze contains rich information about human attention and cognitive processes. This capability makes the underlying technology, known as gaze tracking, a critical enabler for many ubiquitous applications and has triggered the development of easy-to-use gaze estimation services. Indeed, by utilizing the ubiquitous cameras on tablets and smartphones, users can readily access many gaze estimation services. In using these services, users must provide their full-face images to the gaze estimator, which is often a black box. This poses significant privacy threats to the users, especially when a malicious service provider gathers a large collection of face images to classify sensitive user attributes. In this work, we present PrivateGaze, the first approach that can effectively preserve users’ privacy in black-box gaze tracking services without compromising gaze estimation performance. Specifically, we proposed a novel framework to train a privacy preserver that converts full-face images into obfuscated counterparts, which are effective for gaze estimation while containing no privacy information. Evaluation on four datasets shows that the obfuscated image can protect users’ private information, such as identity and gender, against unauthorized attribute classification. Meanwhile, when used directly by the black-box gaze estimator as inputs, the obfuscated images lead to comparable tracking performance to the conventional, unprotected full-face images. ...
Conference paper (2024) - Dinghao Xue, Xiaoran Fan, Tao Chen, Guohao Lan, Qun Song
Deep learning models are increasingly deployed on edge Internet of Things (IoT) devices. However, these models typically operate under supervised conditions and fail to recognize unseen classes different from training. To address this, zero-shot learning (ZSL) aims to classify data of unseen classes with the help of semantic information. Foundation models (FMs) trained on web-scale data have shown impressive ZSL capability in natural language processing and visual understanding. However, leveraging FMs’ generalized knowledge for zero-shot IoT sensing using signals such as mmWave, IMU, and Wi-Fi has not been fully investigated. In this work, we align the IoT data embeddings with the semantic embeddings generated by an FM’s text encoder for zero-shot IoT sensing. To utilize the physics principles governing the generation of IoT sensor signals to derive more effective prompts for semantic embedding extraction, we propose to use cross-attention to combine a learnable soft prompt that is optimized automatically on training data and an auxiliary hard prompt that encodes domain knowledge of the IoT sensing task. To address the problem of IoT embeddings biasing to seen classes due to the lack of unseen class data during training, we propose using data augmentation to synthesize unseen class IoT data for fine-tuning the IoT feature extractor and embedding projector. We evaluate our approach on multiple IoT sensing tasks. Results show that our approach achieves superior open-set detection and generalized zero-shot learning performance compared with various baselines. ...

Adversarial Attack and Defense on Under-Screen Camera

Book chapter (2023) - Hanting Ye, Guohao Lan, Jinyuan Jia, Qing Wang
Smartphones are moving towards the fullscreen design for better user experience. This trend forces front cameras to be placed under screen, leading to Under-Screen Cameras (USC). Accordingly, a small area of the screen is made translucent to allow light to reach the USC. In this paper, we utilize the translucent screen's features to inconspicuously modify its pixels, imperceptible to human eyes but inducing perturbations on USC images. These screen perturbations affect deep learning models in image classification and face recognition. They can be employed to protect user privacy, or disrupt the front camera's functionality in the malicious case. We design two methods, one-pixel perturbation and multiple-pixel perturbation, that can add screen perturbations to images captured by USC and successfully fool various deep learning models. Our evaluations, with three commercial full-screen smartphones on testbed datasets and synthesized datasets, show that screen perturbations significantly decrease the average image classification accuracy, dropping from 85% to only 14% for one-pixel perturbation and 5.5% for multiple-pixel perturbation. For face recognition, the average accuracy drops from 91% to merely 1.8% and 0.25%, respectively. ...

Battery-free Key Generation Using Solar Cells

Journal article (2023) - W. E.I. Bo, X. U. Weitao, G. A.O. Mingcen, L. A.N. Guohao, L. I. Kai, L. U.O. Chengwen, J. I.N. Zhang
Solar cells have been widely used for offering energy for Internet of Things (IoT) devices. Recently, solar cells have also been used as sensors for context awareness sensing due to their sensitivity to varying lighting conditions. In this article, we are the first to use solar cells for symmetric key generation. To generate symmetric keys, we take advantage of photovoltage measurements generated from solar cells equipped with a pair of IoT devices. Symmetric keys are essential for pairing IoT devices and further securing wireless communication. Despite the sensitivity to varying lighting conditions, challenges still remain for the use of solar cells for key generation, such as time unsynchronisation and noisy measurements. To solve these challenges, we design a novel key generation framework, SolarKey, which includes the starting point detection and a compressed sensing-based two-tier key reconciliation method. Extensive experiments have been conducted to evaluate the performance of our proposed key generation method in various environments, which shows the proposed method can improve the key matching rate by up to 25%. We also conduct security analysis and the randomness test, which shows that SolarKey is resilient to common attacks such as the eavesdropping attack and the imitating attack and sufficiently random. ...
Conference paper (2023) - Tao Ni, Guohao Lan, Jia Wang, Qingchuan Zhao, Weitao Xu
Radio-frequency (RF) energy harvesting is a promising technology for Internet-of-Things (IoT) devices to power sensors and prolong battery life. In this paper, we present a novel side-channel attack that leverages RF energy harvesting signals to eavesdrop mobile app activities. To demonstrate this novel attack, we propose AppListener, an automated attack framework that recognizes fine-grained mobile app activities from harvested RF energy. The RF energy is harvested from a custom-built RF energy harvester which generates voltage signals from ambient Wi-Fi transmissions, and app activities are recognized from a three-tier classification algorithm. We evaluate AppListener with four mobile devices running 40 common mobile apps (e.g., YouTube, Facebook, and WhatsApp) belonging to five categories (i.e., video, music, social media, communication, and game); each category contains five application-specific activities. Experiment results show that AppListener achieves over 99% accuracy in differentiating four different mobile devices, over 98% accuracy in classifying 40 different apps, and 86.7% accuracy in recognizing five sets of application-specific activities. Moreover, a comprehensive study is conducted to show AppListener is robust to a number of impact factors, such as distance, environment, and non-target connected devices. Practices of integrating AppListener into commercial IoT devices also demonstrate that it is easy to deploy. Finally, countermeasures are presented as the first step to defend against this novel attack. ...
Journal article (2023) - Dong Ma, Guohao Lan, Changshuo Hu, Mahbub Hassan, Wen Hu, Upama Mushfika, Ashraf Uddin, Moustafa Youssef
We design a system, SolarGest, which can recognize hand gestures near a solar-powered device by analyzing the patterns of the photocurrent. SolarGest is based on the observation that each gesture interferes with incident light rays on the solar panel in a unique way, leaving its discernible signature in harvested photocurrent. Using solar energy harvesting laws, we develop a model to optimize design and usage of SolarGest. To further improve the robustness of SolarGest under non-deterministic operating conditions, we combine dynamic time warping with Z-score transformation in a signal processing pipeline to pre-process each gesture waveform before it is analyzed for classification. We evaluate SolarGest with both conventional opaque solar cells as well as emerging see-through transparent cells. Our experiments demonstrate that SolarGest achieves 99% for six gestures with a single cell and 95% for fifteen gesture with a 2 × 2 solar cell array. The power measuement study suggests that SolarGest consume 44% less power compared to light sensor based systems. ...

Rethinking High-frequency Eye Tracking through the Lenses of Event Cameras

Conference paper (2023) - Guangrong Zhao, Yurun Yang, Jingwei Liu, Ning Chen, Yiran Shen, Hongkai Wen, Guohao Lan
In this paper, we present EV-Eye, a first-of-its-kind large-scale multimodal eye tracking dataset aimed at inspiring research on high-frequency eye/gaze tracking. EV-Eye utilizes the emerging bio-inspired event camera to capture independent pixel-level intensity changes induced by eye movements, achieving sub-microsecond latency. Our dataset was curated over two weeks and collected from 48 participants encompassing diverse genders and age groups. It comprises over 1.5 million near-eye grayscale images and 2.7 billion event samples generated by two DAVIS346 event cameras. Additionally, the dataset contains 675 thousand scene images and 2.7 million gaze references captured by a Tobii Pro Glasses 3 eye tracker for cross-modality validation. Compared with existing event-based high-frequency eye tracking datasets, our dataset is significantly larger in size, and the gaze references involve more natural and diverse eye movement patterns, i.e., fixation, saccade, and smooth pursuit. Alongside the event data, we also present a hybrid eye tracking method as a benchmark, which leverages both the near-eye grayscale images and event data for robust and high-frequency eye tracking. We show that our method achieves higher accuracy for both pupil and gaze estimation tasks compared to the existing solution. ...

A Low-Effort Self-Supervised Domain Adaptation Framework for EMG Sensing

Conference paper (2023) - Di Duan, Huanqi Yang, Guohao Lan, Tianxing Li, Xiaohua Jia, Weitao Xu
This paper presents EMGSense, a low-effort self-supervised domain adaptation framework for sensing applications based on Electromyography (EMG). EMGSense addresses one of the fundamental challenges in EMG cross-user sensing—the significant performance degradation caused by time-varying biological heterogeneity—in a low-effort (data-efficient and label-free) manner. To alleviate the burden of data collection and avoid labor-intensive data annotation, we propose two EMG-specific data augmentation methods to simulate the EMG signals generated in various conditions and scope the exploration in label-free scenarios. We model combating biological heterogeneity-caused performance degradation as a multi-source domain adaptation problem that can learn from the diversity among source users to eliminate EMG heterogeneous biological features. To relearn the target-user-specific biological features from the unlabeled data, we integrate advanced self-supervised techniques into a carefully designed deep neural network (DNN) structure. The DNN structure can seamlessly perform two training stages that complement each other to adapt to a new user with satisfactory performance. Comprehensive evaluations on two sizable datasets collected from 13 participants indicate that EMGSense achieves an average accuracy of 91.9% and 81.2% in gesture recognition and activity recognition, respectively. EMGSense outperforms the state-of-the-art EMG-oriented domain adaptation approaches by 12.5%-17.4% and achieves a comparable performance with the one trained in a supervised learning manner. ...

An Energy Harvesting-based Privacy-Preserving User Identification System by Gait Analysis

Journal article (2022) - Weitao Xu, Wanli Xue, Qi Lin, Guohao Lan, Xingyu Feng, Bo Wei, Chengwen Luo, Wei Li, Albert Y. Zomaya
Smart space has emerged as a new paradigm that combines sensing, communication, and artificial intelligence technologies to offer various customized services. A fundamental requirement of these services is person identification. Although a variety of person-identification approaches has been proposed, they suffer from several limitations in practical applications, such as low energy efficiency, accuracy degradation, and privacy issue. This article proposes an energy-harvesting-based privacy-preserving gait recognition scheme for smart space, which is named PrivGait. In PrivGait, we extract discriminative features from 1-D gait signal and design an attention-based long short-term memory (LSTM) network to classify different people. Moreover, we leverage a novel Bloom filter-based privacy-preserving technique to address the privacy leakage problem. To demonstrate the feasibility of PrivGait, we design a proof-of-concept prototype using off-the-shelf energy-harvesting hardware. Extensive evaluation results show that the proposed scheme outperforms state of the art by 6%-10% and incurs low system cost while preserving user's privacy. ...

Gaze-Based Activity Recognition in an Augmented Reality Art Gallery

Conference paper (2022) - Tim Scargill, Guohao Lan, Maria Gorlatova
The personalization of augmented reality (AR) experiences based on environmental and user context is key to unlocking their full potential. The recent addition of eye tracking to AR headsets provides a convenient method for detecting user context, but complex analysis of raw gaze data is required to detect where a user's attention and thoughts truly lie. In this demo we present Catch My Eye, the first system to incorporate deep neural network (DNN)-based activity recognition from user gaze into a realistic mobile AR app. We develop an edge computing-based architecture to offload context computation from resource-constrained AR devices, and present a working example of content adaptation based on user context, for the scenario of a virtual art gallery. It shows that user activities can be accurately recognized and employed with sufficiently low latency for practical AR applications. ...

Psychology-inspired Eye Movement Synthesis for Gaze-based Activity Recognition

Conference paper (2022) - Guohao Lan, Tim Scargill, Maria Gorlatova
Recent advances in eye tracking have given birth to a new genre of gaze-based context sensing applications, ranging from cognitive load estimation to emotion recognition. To achieve state-of-the-art recognition accuracy, a large-scale, labeled eye movement dataset is needed to train deep learning-based classifiers. However, due to the heterogeneity in human visual behavior, as well as the labor-intensive and privacy-compromising data collection process, datasets for gaze-based activity recognition are scarce and hard to collect. To alleviate the sparse gaze data problem, we present EyeSyn, a novel suite of psychology-inspired generative models that leverages only publicly available images and videos to synthesize a realistic and arbitrarily large eye movement dataset. Taking gaze-based museum activity recognition as a case study, our evaluation demonstrates that EyeSyn can not only replicate the distinct pat-terns in the actual gaze signals that are captured by an eye tracking device, but also simulate the signal diversity that results from dif-ferent measurement setups and subject heterogeneity. Moreover, in the few-shot learning scenario, EyeSyn can be readily incorpo-rated with either transfer learning or meta-learning to achieve 90% accuracy, without the need for a large-scale dataset for training. ...