AV

A.A.F. Verdiesen

info

Please Note

1 records found

Master thesis (2026) - A.A.F. Verdiesen, H.P. Hofstee, Wim Bos, M. Weinmann, Z. Al-Ars
Monitoring the lower airspace for small drones and distinguishing them from birds, helicopters and airplanes, is a growing security need that radar, radio-frequency, and acoustic sensors meet only at considerable cost. This thesis asks whether a ground-based network of synchronized, overlapping RGB cameras can instead reconstruct and classify flying objects directly in 3D, recovering range through multi-view geometry rather than a long-range sensor. The central hypothesis is that the temporal evolution of a 3D Gaussian Splatting representation carries motion cues more discriminative than per-frame 2D or static 3D appearance.

Four contributions support this investigation, which, to our knowledge, is the first to classify flying objects
using temporal 4D Gaussian features. AeroSplat-4D is a synthetic multi-camera dataset and NVIDIA Isaac Sim pipeline emitting synchronized RGB, instance masks, depth, 3D trajectories, and exact calibration across the four classes, with class-balanced, identity-disjoint splits. DepthSplat-OC adapts feed-forward Gaussian splatting to thin, distant targets against a texture-less sky via a mask-gated photometric loss. MambaSplat-4D,
the main contribution, classifies the temporal Gaussian sequences by pairing a rotation-equivariant Vector-Neuron Transformer with a linear-time Mamba temporal encoder, enforcing SO(3) invariance architecturally rather than through augmentation.

In an augmentation-free ablation, aggregating a 24-frame clip rather than classifying a single frame raises accuracy from 59.1 % to 78.8 %, confirming that motion, not single-frame appearance, drives discrimination. Because SO(3) invariance is enforced architecturally, the full-attribute model attains the same 70.2 % four-class accuracy on clean and arbitrarily rotated data, about eight percentage points above a position-only baseline; it trails the strongest temporal baseline by roughly five points on clean data but is uniquely robust under rotation, with zero classification changes across 9600 rotated forward passes. DepthSplat-OC surpasses the closest-protocol baseline (24.65 versus 21.44 PSNR) despite roughly two orders of magnitude less training compute, and the compact 1.9 M-parameter classifier runs in under a millisecond per frame. On the out-of-distribution probe the pipeline does not yet surpass the 2D baselines, a gap that likely reflects their ImageNet-pretrained (∼1.2M-image) backbones rather than a limit of the 3D representation; real-camera transfer remains open, and and the core of the pipeline is released as open-source software at github.com/lumiad-bv/MambaSplat-4D.

This work thereby points toward multi-view 3D reconstruction and temporal reasoning as an effective alternative to the per-frame 2D detection that currently dominates aerial object classification. ...