An Area and Energy Efficient Arithmetic Unit for Stacked Machine Learning Models

Mo Model Mo Problems Like... Hardware Design Problems

More Info
expand_more

Abstract

Machine learning on edge devices performs crucial identification or prediction tasks while limiting the amount of data that needs to be transmitted to more centralized computing nodes. However, strict area and energy requirements necessitate specialized hardware developed for the requirements of the device and model. This thesis is concerned with developing an area and energy arithmetic unit as part of the implementation of a stacked machine learning model in embedded automotive devices. The model in question was previously designed to perform lifetime prediction with the goal of improving the reliability of semiconductor devices used in various automotive applications.

This thesis aims to achieve area and energy efficiency by exploiting the commonalities in the arithmetic operations of several of the internal learners of the stacked machine learning model. The use of a weighted figure of merit, taking into account area, energy and delay, allow for simple comparisons of designs at any operation frequency and easy insight into the changes in the merit of designs if device requirements were to change. A sweep of the percentage of multiplications in the workload also gave insight into how design choices may change due to future redesigns of the stacked machine learning model.

It was found that the MAC, multiply, divide and accumulate operations of the internal learners can best be supported by one arithmetic unit containing a "Reduced Area" parallel multiplier (still taking up most of the area), a small, dedicated accumulator and invariant integer division using the multiplier. It was also found that the ability to reconfigure the multiplier for different levels of bit-precision does not yield performance improvement for the expected precision distribution.