This work investigates the feasibility of performing monocular depth estimation on highly resource-constrained hardware, specifically the Raspberry Pi Pico Zero microcontroller. In contrast to existing approaches that rely on large convolutional networks and high performance devi
...
This work investigates the feasibility of performing monocular depth estimation on highly resource-constrained hardware, specifically the Raspberry Pi Pico Zero microcontroller. In contrast to existing approaches that rely on large convolutional networks and high performance devices, this study explores a set of custom lightweight encoder-decoder architectures, including one inspired by L-ENet, L-EfficientUNet, μPyD-Net, and an LSTM-μPyD-Net combination, designed to operate within strict memory limits. These models were trained on a preprocessed KITTI dataset, with either LiDAR depth maps or SGM (Semi-Global Matching) dense depth maps, and evaluated in terms of accuracy, model size, and real-time inference performance. Results demonstrate that meaningful depth prediction is achievable on microcontrollers, paving the way for low-cost autonomous navigation systems and broader applications of TinyML in embedded robotics, with SGM proving to be the best preprocessing technique, and the LSTM-μPyD-Net having the best accuracy when trained on the full Train split of the KITTI dataset.