HA
Hesam Araghi
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
6 records found
1
Event cameras are a novel sensing modality, but the lack of densely annotated datasets remains a major limitation for tasks such as monocular depth estimation. To address this, we investigate how Vision Foundation Models (VFMs), trained on large-scale RGB datasets, can be leveraged for event-based depth estimation. Previous work combines handcrafted event representations with fine-tuning of VFMs to adapt them to the event domain. In contrast, we learn an event representation while keeping the VFM frozen. We evaluate two representation learners, a U-Net and a Fully Convolutional (FullyConv) model, on DSEC and MVSEC. The results show that learned event representations are highly effective in-domain: both models outperform all baselines on DSEC, including Depth AnyEvent (DAE) and direct RGB input to Depth Anything V2 (DAv2), while the FullyConv model remains competitive on MVSEC. Cross-dataset experiments show that this improvement does not consistently transfer under domain shift. These findings indicate that learning the input representation is a strong strategy for in-domain event-based depth estimation, but that representation learning alone is not sufficient to guarantee robust cross-dataset generalization.
...
Event cameras are a novel sensing modality, but the lack of densely annotated datasets remains a major limitation for tasks such as monocular depth estimation. To address this, we investigate how Vision Foundation Models (VFMs), trained on large-scale RGB datasets, can be leveraged for event-based depth estimation. Previous work combines handcrafted event representations with fine-tuning of VFMs to adapt them to the event domain. In contrast, we learn an event representation while keeping the VFM frozen. We evaluate two representation learners, a U-Net and a Fully Convolutional (FullyConv) model, on DSEC and MVSEC. The results show that learned event representations are highly effective in-domain: both models outperform all baselines on DSEC, including Depth AnyEvent (DAE) and direct RGB input to Depth Anything V2 (DAv2), while the FullyConv model remains competitive on MVSEC. Cross-dataset experiments show that this improvement does not consistently transfer under domain shift. These findings indicate that learning the input representation is a strong strategy for in-domain event-based depth estimation, but that representation learning alone is not sufficient to guarantee robust cross-dataset generalization.
Optical Flow Estimation Using Event-Based Cameras
Improving Optical Flow Estimation Accuracy Using Space-Aware De-Flickering
Event cameras are novel sensors whose high temporal resolution and bandwidth motivate their use for the optical flow estimation problem. However, the properties of event cameras also introduce a vulnerability to flickering. Flickering hurts the perceptibility of motion by overwhelming event data with unrelated information. The single existing event de-flicker method (EFR) is built for scenarios where the relative position of the camera and the flickering object is constant, which is uncommon in motion-heavy optical flow estimation scenarios. Our contribution is a new de-flickering method that incorporates spatial awareness of nearby pixels. We hypothesize this feature to increase robustness to movement, and thus to better improve optical flow accuracy. Compared to EFR our method falters at filtering intensely flickering surfaces, but better preserves the spatial coherence of edges. However, we observe that both de-flickering methods remove much geometric information, especially given slow motion or weak ambient illumination. Our benchmarking shows that neither our method nor EFR significantly affects optical flow estimation accuracy, despite reducing event counts by 50-65%. Overall, we conclude that the niche benefits of spatial filtering are nullified by the result that filtering hardly affects optical flow estimation.
...
Event cameras are novel sensors whose high temporal resolution and bandwidth motivate their use for the optical flow estimation problem. However, the properties of event cameras also introduce a vulnerability to flickering. Flickering hurts the perceptibility of motion by overwhelming event data with unrelated information. The single existing event de-flicker method (EFR) is built for scenarios where the relative position of the camera and the flickering object is constant, which is uncommon in motion-heavy optical flow estimation scenarios. Our contribution is a new de-flickering method that incorporates spatial awareness of nearby pixels. We hypothesize this feature to increase robustness to movement, and thus to better improve optical flow accuracy. Compared to EFR our method falters at filtering intensely flickering surfaces, but better preserves the spatial coherence of edges. However, we observe that both de-flickering methods remove much geometric information, especially given slow motion or weak ambient illumination. Our benchmarking shows that neither our method nor EFR significantly affects optical flow estimation accuracy, despite reducing event counts by 50-65%. Overall, we conclude that the niche benefits of spatial filtering are nullified by the result that filtering hardly affects optical flow estimation.
Computer vision tasks have shown to benefit greatly from both developments in deep learning networks, and the emergence of event cameras. Deep networks can require a large amount of training data, which is not readily available for event cameras, specifically for optical flow estimation. The need for simulating this data in a realistic, physics-driven manner is therefore crucial. This paper compares the state of the art event camera simulators on different criteria, including event timestamp modeling, performance under low illumination, bandwidth simulation, computation speed and various types of noise simulation. We also summarize the shortcomings of some commonly used optical flow event datasets. For generating high-quality, realistic events, The V2E and DVS-Voltmeter simulators have shown to produce the most accurate data.
...
Computer vision tasks have shown to benefit greatly from both developments in deep learning networks, and the emergence of event cameras. Deep networks can require a large amount of training data, which is not readily available for event cameras, specifically for optical flow estimation. The need for simulating this data in a realistic, physics-driven manner is therefore crucial. This paper compares the state of the art event camera simulators on different criteria, including event timestamp modeling, performance under low illumination, bandwidth simulation, computation speed and various types of noise simulation. We also summarize the shortcomings of some commonly used optical flow event datasets. For generating high-quality, realistic events, The V2E and DVS-Voltmeter simulators have shown to produce the most accurate data.
Optical flow estimation with event cameras encompasses two primary algorithm classes: model-based and learning-based methods. Model-based approaches, do not require any training data while learning-based approaches utilize datasets of events to train neural networks. To effectively apply these algorithms, it's essential to understand their respective strengths and weaknesses.
This study compares model-based and learning-based optical flow estimation methods using event cameras, aiming to provide guidance for real-world applications. We evaluated these methods on the MVSEC and DSEC datasets, focusing on their accuracy and runtime. Our findings indicate that model-based methods excel on the MVSEC dataset, characterized by small motions, while learning-based approaches perform better on the more dynamic DSEC dataset. To investigate potential overfitting of learning-based methods to DSEC, we retrained the IDNet and TMA models on the BlinkFlow dataset. The retrained models demonstrated competitive accuracy, surpassing model-based methods which indicates that learning-based models perform better on datasets like DSEC even when not able to overfit. Finally, our analysis on runtime showed that model-based methods achieve real-time performance on CPUs and learning-based methods require a GPU to run in real-time. ...
This study compares model-based and learning-based optical flow estimation methods using event cameras, aiming to provide guidance for real-world applications. We evaluated these methods on the MVSEC and DSEC datasets, focusing on their accuracy and runtime. Our findings indicate that model-based methods excel on the MVSEC dataset, characterized by small motions, while learning-based approaches perform better on the more dynamic DSEC dataset. To investigate potential overfitting of learning-based methods to DSEC, we retrained the IDNet and TMA models on the BlinkFlow dataset. The retrained models demonstrated competitive accuracy, surpassing model-based methods which indicates that learning-based models perform better on datasets like DSEC even when not able to overfit. Finally, our analysis on runtime showed that model-based methods achieve real-time performance on CPUs and learning-based methods require a GPU to run in real-time. ...
Optical flow estimation with event cameras encompasses two primary algorithm classes: model-based and learning-based methods. Model-based approaches, do not require any training data while learning-based approaches utilize datasets of events to train neural networks. To effectively apply these algorithms, it's essential to understand their respective strengths and weaknesses.
This study compares model-based and learning-based optical flow estimation methods using event cameras, aiming to provide guidance for real-world applications. We evaluated these methods on the MVSEC and DSEC datasets, focusing on their accuracy and runtime. Our findings indicate that model-based methods excel on the MVSEC dataset, characterized by small motions, while learning-based approaches perform better on the more dynamic DSEC dataset. To investigate potential overfitting of learning-based methods to DSEC, we retrained the IDNet and TMA models on the BlinkFlow dataset. The retrained models demonstrated competitive accuracy, surpassing model-based methods which indicates that learning-based models perform better on datasets like DSEC even when not able to overfit. Finally, our analysis on runtime showed that model-based methods achieve real-time performance on CPUs and learning-based methods require a GPU to run in real-time.
This study compares model-based and learning-based optical flow estimation methods using event cameras, aiming to provide guidance for real-world applications. We evaluated these methods on the MVSEC and DSEC datasets, focusing on their accuracy and runtime. Our findings indicate that model-based methods excel on the MVSEC dataset, characterized by small motions, while learning-based approaches perform better on the more dynamic DSEC dataset. To investigate potential overfitting of learning-based methods to DSEC, we retrained the IDNet and TMA models on the BlinkFlow dataset. The retrained models demonstrated competitive accuracy, surpassing model-based methods which indicates that learning-based models perform better on datasets like DSEC even when not able to overfit. Finally, our analysis on runtime showed that model-based methods achieve real-time performance on CPUs and learning-based methods require a GPU to run in real-time.
Event cameras are bio-inspired sensors with high dynamic range, high temporal resolution, and low power consumption. These features enable precise motion detection even in challenging lighting conditions and fast-changing scenes, rendering them well-suited for optical flow estimation. However, event camera output is sparse and unstructured, making it challenging to process. Transformer architectures have shown to be effective in capturing long-term temporal dependencies and processing sparse input, hence they might be better suited to processing this output by leveraging the fine time granularity inherent to event camera data.
We introduce E-GMFlow, an approach for event-based optical flow inspired by the recent success in terms of accuracy of transformer-based models for frame-based optical flow. We explore the effect of temporal details on the accuracy of this transformer architecture by changing the number of temporal bins in which events are discretized. We observe that the increase in the number of temporal bins generally causes higher accuracy and comment on the limitations of this study. ...
We introduce E-GMFlow, an approach for event-based optical flow inspired by the recent success in terms of accuracy of transformer-based models for frame-based optical flow. We explore the effect of temporal details on the accuracy of this transformer architecture by changing the number of temporal bins in which events are discretized. We observe that the increase in the number of temporal bins generally causes higher accuracy and comment on the limitations of this study. ...
Event cameras are bio-inspired sensors with high dynamic range, high temporal resolution, and low power consumption. These features enable precise motion detection even in challenging lighting conditions and fast-changing scenes, rendering them well-suited for optical flow estimation. However, event camera output is sparse and unstructured, making it challenging to process. Transformer architectures have shown to be effective in capturing long-term temporal dependencies and processing sparse input, hence they might be better suited to processing this output by leveraging the fine time granularity inherent to event camera data.
We introduce E-GMFlow, an approach for event-based optical flow inspired by the recent success in terms of accuracy of transformer-based models for frame-based optical flow. We explore the effect of temporal details on the accuracy of this transformer architecture by changing the number of temporal bins in which events are discretized. We observe that the increase in the number of temporal bins generally causes higher accuracy and comment on the limitations of this study.
We introduce E-GMFlow, an approach for event-based optical flow inspired by the recent success in terms of accuracy of transformer-based models for frame-based optical flow. We explore the effect of temporal details on the accuracy of this transformer architecture by changing the number of temporal bins in which events are discretized. We observe that the increase in the number of temporal bins generally causes higher accuracy and comment on the limitations of this study.
Unsupervised optical flow estimation of event cameras
The influence of training sets on model performance
Event cameras are cameras that capture events asynchronously based on changes in lighting. They offer multiple benifits, but pose challenges in computer vision due to their asynchronous nature and hard to capture ground truth values to compare against. This paper shows the effects training of a state of the art unsupervised learning algorithm Taming Contrast Maximisation for predicting optical flow on a new dataset BlinkFlow which promises improvements in performance of supervised algorithms. This paper aims to see if these improved performances also happen for unsupervised models. Results of this research were inconclusive for the effectiveness of training unsupervised models, but it was shown that pretrained models on DSEC and MVSEC datasets did not perform well on this new dataset.
...
Event cameras are cameras that capture events asynchronously based on changes in lighting. They offer multiple benifits, but pose challenges in computer vision due to their asynchronous nature and hard to capture ground truth values to compare against. This paper shows the effects training of a state of the art unsupervised learning algorithm Taming Contrast Maximisation for predicting optical flow on a new dataset BlinkFlow which promises improvements in performance of supervised algorithms. This paper aims to see if these improved performances also happen for unsupervised models. Results of this research were inconclusive for the effectiveness of training unsupervised models, but it was shown that pretrained models on DSEC and MVSEC datasets did not perform well on this new dataset.