The rapid growth in the energy consumption of artificial intelligence (AI) models has made low-power, high-efficiency, brain-inspired computing hardware a central research focus. Hierarchical temporal memory (HTM) offers robustness and energy efficiency via its spatial pooler (SP
...
The rapid growth in the energy consumption of artificial intelligence (AI) models has made low-power, high-efficiency, brain-inspired computing hardware a central research focus. Hierarchical temporal memory (HTM) offers robustness and energy efficiency via its spatial pooler (SP) and the use of sparse distributed representations (SDRs), but its temporal memory (TM) is highly sensitive to input perturbations. Moreover, most existing hardware efforts target isolated modules rather than a complete, efficient inference loop. To address these issues, we employ the HTM SP as a sparse encoder and replace TM with deep neural networks, proposing a dual-path SP+FCNN/RNN framework. On sequential MNIST, this scheme achieves 96.5% classification accuracy and markedly improves generalization.
Targeting the three computational bottlenecks revealed in the algorithmic analysis: SP overlap dot products, k-WTA sorting, and neural network multiply-accumulate operations, this work designs a 25-lane parallel binary dot-product accelerator (MAC-SP), a hierarchical min-heap sorting accelerator (Heap-Sort), and a 16 × 16 systolic-array accelerator (MAC-NN). All three accelerators are integrated, with a unified (memory-mapped input/output) MMIO interface, into the open-source CROC SoC built around an Ibex core. Logic synthesis in a 40 nm process shows that the complete SoC occupies only 0.112 mm2, consumes 1.078 mW, and achieves a maximum clock frequency of 400 MHz. Relative to the pure software baseline, the three core operations achieve speedups of 28.7×, 23.5×, and 38.1×, respectively. End-to-end inference for a single MNIST image drops from 3.78 × 107 to 1.43 × 106 clock cycles, for an overall 26.4× speedup.