The rapid advancement of artificial intelligence (AI) and deep learning has intensified computational demands, exposing inefficiencies in traditional von Neumann architectures due to the "memory wall" problem. Computation-in-Memory (CIM) emerges as a promising paradigm, performin
...
The rapid advancement of artificial intelligence (AI) and deep learning has intensified computational demands, exposing inefficiencies in traditional von Neumann architectures due to the "memory wall" problem. Computation-in-Memory (CIM) emerges as a promising paradigm, performing computations directly within memory arrays to minimize data movement and enhance energy efficiency. However, current CIM design methodologies face significant challenges in configuration generation, performance characterization, and application-specific optimization. This thesis addresses these gaps by proposing a simulation framework that enables architectural exploration of multi-tile CIM-based NN accelerators with a focus on network traffic modeling, resource utilization, and communication efficiency.
The framework abstracts NN operations into hardware-executable patterns, encompassing both fully connected and convolutional layers, and incorporates key CIM components such as crossbars, accumulators, and activation units. Specialized mapping strategies and tiling techniques are introduced to capture realistic execution, while a bandwidth-constrained interconnect model quantifies communication bottlenecks. The framework supports both major convolution-to-MVM conversion schemes, Im2Col and K2M, allowing systematic evaluation of trade-offs between latency, area, and energy efficiency. Additionally, the framework integrates a topology visualization module that automatically generates Graphviz-based diagrams of component interconnections and data flows, enabling intuitive inspection of communication patterns and design bottlenecks.
Experimental evaluation on MNIST demonstrates the framework’s capability to reveal fundamental design insights. Results show that while K2M maximizes crossbar utilization (0.8–1.0) and reduces latency by up to 100×, Im2Col achieves up to 1000× reductions in crossbar count and data transfer energy, making it favorable for area- and energy-constrained systems. Bandwidth analysis further highlights the interaction between communication capacity and optimal crossbar sizing, establishing design guidelines that balance delay, efficiency, and scalability.
Overall, this thesis contributes a flexible and extensible simulation environment for multi-tile CIM accelerators that bridges the gap between neural network workloads and CIM hardware design, providing a foundation for future architectural exploration and hybrid optimization strategies in NN accelerators.