Hardware Accelerator Design Based on Hierarchical Temporal Memory

None, None

Hardware Accelerator Design Based on Hierarchical Temporal Memory

Master Thesis (2025)

Author(s)

R. Wu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Frenkel – Mentor (TU Delft - Electronic Instrumentation)

Rajendra Bishnoi – Mentor (TU Delft - Computer Engineering)

D. Casnici – Mentor (TU Delft - Electronic Instrumentation)

D. Layh – Mentor (TU Delft - Electronic Instrumentation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Neuromorphic Computing Hardware Accelerator

To reference this document use:

https://resolver.tudelft.nl/uuid:f3692ecf-a65d-4efb-9cf2-24782f39bc3e

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

26-08-2025

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid growth in the energy consumption of artificial intelligence (AI) models has made low-power, high-efficiency, brain-inspired computing hardware a central research focus. Hierarchical temporal memory (HTM) offers robustness and energy efficiency via its spatial pooler (SP) and the use of sparse distributed representations (SDRs), but its temporal memory (TM) is highly sensitive to input perturbations. Moreover, most existing hardware efforts target isolated modules rather than a complete, efficient inference loop. To address these issues, we employ the HTM SP as a sparse encoder and replace TM with deep neural networks, proposing a dual-path SP+FCNN/RNN framework. On sequential MNIST, this scheme achieves 96.5% classification accuracy and markedly improves generalization.

Targeting the three computational bottlenecks revealed in the algorithmic analysis: SP overlap dot products, k-WTA sorting, and neural network multiply-accumulate operations, this work designs a 25-lane parallel binary dot-product accelerator (MAC-SP), a hierarchical min-heap sorting accelerator (Heap-Sort), and a 16 × 16 systolic-array accelerator (MAC-NN). All three accelerators are integrated, with a unified (memory-mapped input/output) MMIO interface, into the open-source CROC SoC built around an Ibex core. Logic synthesis in a 40 nm process shows that the complete SoC occupies only 0.112 mm2, consumes 1.078 mW, and achieves a maximum clock frequency of 400 MHz. Relative to the pure software baseline, the three core operations achieve speedups of 28.7×, 23.5×, and 38.1×, respectively. End-to-end inference for a single MNIST image drops from 3.78 × 107 to 1.43 × 106 clock cycles, for an overall 26.4× speedup.

Files

Master_s_Thesis.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 26-08-2026