AI on low-cost hardware

Microcontroller subgroup

More Info
expand_more

Abstract

The creation of effective computational models that function within the power limitations of edge de- vices is an important research problem in the field of Artificial Intelligence (AI). While cutting-edge deep learning algorithms show promising results, they frequently need computing resources that are many orders of magnitude more than the available power and memory budgets for these devices. During the thesis, two unique learning algorithms (backpropagation and forward-forward) were developed and compared using the Teensy 4.1, a low-cost microcontroller board. This work seeks to bridge the gap between the necessary computing efficiency and the hardware’s restricted resources.

By creating and analyzing these algorithms, with the Fashion MNIST dataset as a validation set, this thesis creates a baseline for AI efficiency on microcontrollers, with performance targets set at a mini- mum of 80% test accuracy. The microcontroller software, implemented in C++, is limited to using less than 512 kB RAM for all online training methods. In addition, the potential of transfer learning was also explored.

Key performance parameters, including memory utilization, training and inference times, and accu- racy, were analyzed in a comparative study of the backpropagation and forward-forward algorithms. For each learning algorithm, several configurations were explored (such as topologies, and optimizers) to determine the most effective and efficient way for AI implementation on low-cost hardware. The key conclusions of this study reveal that backpropagation demonstrates superior performance in terms of both accuracy and computational efficiency. However, it requires more memory for storing variables, which may be a constraint in on-edge environments. Conversely, the forward-forward algorithm, while achieving lower accuracy, is more memory-efficient, making it a potential choice for less complex tasks or systems with severe RAM limitations.

The application of transfer learning showed potential to accelerate the learning process and to improve the final accuracy, hinting at an effective strategy for deploying advanced AI models on resource-limited edge devices.