Gunshot detection plays a critical role in protecting African wildlife and reducing illegal poaching activities. The decline of keystone species disrupts ecosystems and inhibits forest CO₂ absorption, contributing to climate change. To support conservation efforts, this thesis co
...
Gunshot detection plays a critical role in protecting African wildlife and reducing illegal poaching activities. The decline of keystone species disrupts ecosystems and inhibits forest CO₂ absorption, contributing to climate change. To support conservation efforts, this thesis contributes to the development of an embedded acoustic surveillance system that detects gunshot sounds and locates their sources, enabling rangers to respond in real time. However, realizing such a system is challenging due to limitations in data, resources, and infrastructure.
This thesis proposes a lightweight convolutional recurrent neural network, combining depthwise separable convolutions (DSConv) and a gated recurrent unit (GRU), designed to detect the onset of gunshot sounds for trilaterating the shooter's position. The model was trained on a real-world gunshot dataset and optimized for deployment on the ultra-low-power STM32U5 microcontroller. A series of incremental experiments on architectural design, data manipulation, and feature selection aimed to improve performance and efficiency.
To evaluate detection performance under class imbalance, independent of confidence thresholds, a novel onset-based area under the precision–recall curve (AUPRC) metric was proposed. Computational cost was evaluated through STM32U5 inference benchmarks. Experimental results showed that time-shift augmentation provided the largest performance gain, followed by modest improvements from regularization techniques. Class rebalancing and background noise augmentation had minor effects. Replacing standard convolutions with DSConv substantially improved efficiency. Finally, using ∆mel-frequency cepstral coefficients (∆MFCCs) as input features further improved both performance and efficiency.
Overall, the final ∆MFCC DSConv-GRU model outperformed the quasi-DenseNet baseline in F1-score (+2.6%) while reducing multiply-accumulate operations (-96.9%), RAM usage (-95.8%), and runtime (-97.3%). These improvements enabled real-time inference on microcontrollers, demonstrating that a lightweight deep learning model can perform effectively under strict resource constraints. Hence, this work provides a foundational step toward future embedded gunshot detection systems for wildlife monitoring and anti-poaching applications.