Gunshot Sound Onset Detection on MCUs with Tiny Conv-GRU

None, None

Gunshot Sound Onset Detection on MCUs with Tiny Conv-GRU

Master Thesis (2025)

Author(s)

T.Y. Huang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Qing Wang – Graduation committee member (TU Delft - Embedded Systems)

G. Gaydadjiev – Graduation committee member (TU Delft - Computer Engineering)

D. Danaei – Mentor (Alten)

Faculty

Electrical Engineering, Mathematics and Computer Science

Embedded Systems Sound Event Detection Gunshot Detection Convolution Recurrent Neural Network (CRNN) Poaching Prevention STM32U5 Microcontroller

To reference this document use:

https://resolver.tudelft.nl/uuid:ada6fcca-7fa9-43c3-82dc-44f43d83b449

More Info

expand_more

Publication Year

2025

Language

English

Coordinates

51.91544787872669, 4.543618297323665

Graduation Date

10-10-2025

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Gunshot detection plays a critical role in protecting African wildlife and reducing illegal poaching activities. The decline of keystone species disrupts ecosystems and inhibits forest CO₂ absorption, contributing to climate change. To support conservation efforts, this thesis contributes to the development of an embedded acoustic surveillance system that detects gunshot sounds and locates their sources, enabling rangers to respond in real time. However, realizing such a system is challenging due to limitations in data, resources, and infrastructure.

This thesis proposes a lightweight convolutional recurrent neural network, combining depthwise separable convolutions (DSConv) and a gated recurrent unit (GRU), designed to detect the onset of gunshot sounds for trilaterating the shooter's position. The model was trained on a real-world gunshot dataset and optimized for deployment on the ultra-low-power STM32U5 microcontroller. A series of incremental experiments on architectural design, data manipulation, and feature selection aimed to improve performance and efficiency.

To evaluate detection performance under class imbalance, independent of confidence thresholds, a novel onset-based area under the precision–recall curve (AUPRC) metric was proposed. Computational cost was evaluated through STM32U5 inference benchmarks. Experimental results showed that time-shift augmentation provided the largest performance gain, followed by modest improvements from regularization techniques. Class rebalancing and background noise augmentation had minor effects. Replacing standard convolutions with DSConv substantially improved efficiency. Finally, using ∆mel-frequency cepstral coefficients (∆MFCCs) as input features further improved both performance and efficiency.

Overall, the final ∆MFCC DSConv-GRU model outperformed the quasi-DenseNet baseline in F1-score (+2.6%) while reducing multiply-accumulate operations (-96.9%), RAM usage (-95.8%), and runtime (-97.3%). These improvements enabled real-time inference on microcontrollers, demonstrating that a lightweight deep learning model can perform effectively under strict resource constraints. Hence, this work provides a foundational step toward future embedded gunshot detection systems for wildlife monitoring and anti-poaching applications.

Files

MSc_Thesis_Tsin_Yu_Huang.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 10-10-2027