Hardware Accelerator Design Based on Hierarchical Temporal Memory

Master Thesis (2025)
Author(s)

R. Wu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Frenkel – Mentor (TU Delft - Electronic Instrumentation)

Rajendra Bishnoi – Mentor (TU Delft - Computer Engineering)

D. Casnici – Mentor (TU Delft - Electronic Instrumentation)

D. Layh – Mentor (TU Delft - Electronic Instrumentation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-08-2025
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid growth in the energy consumption of artificial intelligence (AI) models has made low-power, high-efficiency, brain-inspired computing hardware a central research focus. Hierarchical temporal memory (HTM) offers robustness and energy efficiency via its spatial pooler (SP) and the use of sparse distributed representations (SDRs), but its temporal memory (TM) is highly sensitive to input perturbations. Moreover, most existing hardware efforts target isolated modules rather than a complete, efficient inference loop. To address these issues, we employ the HTM SP as a sparse encoder and replace TM with deep neural networks, proposing a dual-path SP+FCNN/RNN framework. On sequential MNIST, this scheme achieves 96.5% classification accuracy and markedly improves generalization.

Targeting the three computational bottlenecks revealed in the algorithmic analysis: SP overlap dot products, k-WTA sorting, and neural network multiply-accumulate operations, this work designs a 25-lane parallel binary dot-product accelerator (MAC-SP), a hierarchical min-heap sorting accelerator (Heap-Sort), and a 16 × 16 systolic-array accelerator (MAC-NN). All three accelerators are integrated, with a unified (memory-mapped input/output) MMIO interface, into the open-source CROC SoC built around an Ibex core. Logic synthesis in a 40 nm process shows that the complete SoC occupies only 0.112 mm2, consumes 1.078 mW, and achieves a maximum clock frequency of 400 MHz. Relative to the pure software baseline, the three core operations achieve speedups of 28.7×, 23.5×, and 38.1×, respectively. End-to-end inference for a single MNIST image drops from 3.78 × 107 to 1.43 × 106 clock cycles, for an overall 26.4× speedup.

Files

Master_s_Thesis.pdf
(pdf | 0 Mb)
License info not available
warning

File under embargo until 26-08-2026