MJ
M. Jiang
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Event cameras are a novel sensing modality, but the lack of densely annotated datasets remains a major limitation for tasks such as monocular depth estimation. To address this, we investigate how Vision Foundation Models (VFMs), trained on large-scale RGB datasets, can be leveraged for event-based depth estimation. Previous work combines handcrafted event representations with fine-tuning of VFMs to adapt them to the event domain. In contrast, we learn an event representation while keeping the VFM frozen. We evaluate two representation learners, a U-Net and a Fully Convolutional (FullyConv) model, on DSEC and MVSEC. The results show that learned event representations are highly effective in-domain: both models outperform all baselines on DSEC, including Depth AnyEvent (DAE) and direct RGB input to Depth Anything V2 (DAv2), while the FullyConv model remains competitive on MVSEC. Cross-dataset experiments show that this improvement does not consistently transfer under domain shift. These findings indicate that learning the input representation is a strong strategy for in-domain event-based depth estimation, but that representation learning alone is not sufficient to guarantee robust cross-dataset generalization.
...
Event cameras are a novel sensing modality, but the lack of densely annotated datasets remains a major limitation for tasks such as monocular depth estimation. To address this, we investigate how Vision Foundation Models (VFMs), trained on large-scale RGB datasets, can be leveraged for event-based depth estimation. Previous work combines handcrafted event representations with fine-tuning of VFMs to adapt them to the event domain. In contrast, we learn an event representation while keeping the VFM frozen. We evaluate two representation learners, a U-Net and a Fully Convolutional (FullyConv) model, on DSEC and MVSEC. The results show that learned event representations are highly effective in-domain: both models outperform all baselines on DSEC, including Depth AnyEvent (DAE) and direct RGB input to Depth Anything V2 (DAv2), while the FullyConv model remains competitive on MVSEC. Cross-dataset experiments show that this improvement does not consistently transfer under domain shift. These findings indicate that learning the input representation is a strong strategy for in-domain event-based depth estimation, but that representation learning alone is not sufficient to guarantee robust cross-dataset generalization.
Human activity recognition plays an interesting and important role nowadays as there are a variety of use cases. It is utilized in health monitoring, in the development of human-computer interaction system and in security monitoring. However current methods involve usage of privacy sensitive data and impractical sensors for everyday usage. To tackle this problem, we aim to answer the research question "How to maximize the capabilities of in-mouth sensors for human activity recognition?". The main contributions of this paper are the classification of different gestures using an in-mouth device, implementation of a classifier directly onto a microcontroller and the evaluation whether the models can generalize to multiple people. To investigate this, we experimented with popular classical machine learning classifiers: Decision Tree, K-Nearest Neighbors, Support Vector Machine, Logistic Regression and Random Forest classifiers. The results shows that the F1-score of all classification problems are above 80% using the various classifiers along with different parameters.
...
Human activity recognition plays an interesting and important role nowadays as there are a variety of use cases. It is utilized in health monitoring, in the development of human-computer interaction system and in security monitoring. However current methods involve usage of privacy sensitive data and impractical sensors for everyday usage. To tackle this problem, we aim to answer the research question "How to maximize the capabilities of in-mouth sensors for human activity recognition?". The main contributions of this paper are the classification of different gestures using an in-mouth device, implementation of a classifier directly onto a microcontroller and the evaluation whether the models can generalize to multiple people. To investigate this, we experimented with popular classical machine learning classifiers: Decision Tree, K-Nearest Neighbors, Support Vector Machine, Logistic Regression and Random Forest classifiers. The results shows that the F1-score of all classification problems are above 80% using the various classifiers along with different parameters.