A.B. Gebregiorgis | TU Delft Repository

FeFET-based On-chip Learning for Convolutional Neural Networks

Master thesis (2025) - L.E. Hoogland, S. Hamdioui, A.B. Gebregiorgis, Theofilos Spyrou, D.G. Muratore, A.N.N. Mahmoud

Modern Artificial Intelligence (AI) applications, such as Deep Neural Networks (DNNs), require substantial amounts of data in order to carry out the classification or recognition task, which must be retrieved from the memory, supplied to the processor, and finally the results stored back in the memory. In Von-Neumann architectures, this data movement incurs significant performance costs, leaving the CPU with many idle cycles while waiting for data to arrive. One way of addressing this issue is by investigating alternative computing paradigms, such as Computation in Memory (CIM). In CIM architectures, the processor and the memory are integrated into one physical location. As such, computations are performed in the memory core directly, without the need to be transferred to a central processor. A promising technology to efficiently implement CIM crossbar arrays is the emerging Ferroelectric Field Effect Transistor (FeFET), in which data can be stored in a non-volatile manner in the polarization state of a ferroelectric layer.

In existing literature, CIM crossbar arrays are optimized for the inference task, but do not perform the learning task locally. This means the neural network is trained externally, for example using cloud computing. Only once the training is finished, the weights are written to the physical crossbar array. For medical applications, such as ECG classification, sending sensitive medical data off to the cloud for training leads to privacy concerns. A solution to this problem is On-chip learning: training the network locally in the crossbar itself.

This thesis focuses on integrating the FeFET technology in a CIM architecture to design a crossbar array that supports On-Chip learning for Convolutional Neural Networks. The accelerator overcomes the memory-wall inherent to Von Neumann machines by embracing the CIM framework and uses FeFET devices to overcome the scaling walls associated with CMOS technology. The result is a novel accelerator which leverages the parallelism of Analog Crossbars to optimize the inference task and forward propagation, while leveraging the accuracy of Digital Crossbars to optimize the back propagation task. ...

Modern Artificial Intelligence (AI) applications, such as Deep Neural Networks (DNNs), require substantial amounts of data in order to carry out the classification or recognition task, which must be retrieved from the memory, supplied to the processor, and finally the results stored back in the memory. In Von-Neumann architectures, this data movement incurs significant performance costs, leaving the CPU with many idle cycles while waiting for data to arrive. One way of addressing this issue is by investigating alternative computing paradigms, such as Computation in Memory (CIM). In CIM architectures, the processor and the memory are integrated into one physical location. As such, computations are performed in the memory core directly, without the need to be transferred to a central processor. A promising technology to efficiently implement CIM crossbar arrays is the emerging Ferroelectric Field Effect Transistor (FeFET), in which data can be stored in a non-volatile manner in the polarization state of a ferroelectric layer.

In existing literature, CIM crossbar arrays are optimized for the inference task, but do not perform the learning task locally. This means the neural network is trained externally, for example using cloud computing. Only once the training is finished, the weights are written to the physical crossbar array. For medical applications, such as ECG classification, sending sensitive medical data off to the cloud for training leads to privacy concerns. A solution to this problem is On-chip learning: training the network locally in the crossbar itself.

This thesis focuses on integrating the FeFET technology in a CIM architecture to design a crossbar array that supports On-Chip learning for Convolutional Neural Networks. The accelerator overcomes the memory-wall inherent to Von Neumann machines by embracing the CIM framework and uses FeFET devices to overcome the scaling walls associated with CMOS technology. The result is a novel accelerator which leverages the parallelism of Analog Crossbars to optimize the inference task and forward propagation, while leveraging the accuracy of Digital Crossbars to optimize the back propagation task.

RdaCIM: A Read-Disturb-Aware Computation-In-Memory Simulator

Master thesis (2025) - Z. Huang, G. Gaydadjiev, A.B. Gebregiorgis, Theofilos Spyrou, S. Feld

Memristor-based Computation-In-Memory (CIM) architectures are a genre of emerging computing designs, and they have the potential to provide a power-efficient computational power for artificial intelligence (AI). However, current memristor-based CIM designs face the challenges from non-idealities, such as read disturb. The mitigation of non-idealities in CIM architectures is an area of active research.

Simulation tools are important design tools for CIM. Conventionally, SPICE simulations are used for CIM architectures. Modern high-level simulation frameworks for CIM are faster when compared to SPICE, and therefore it is desirable to investigate the feasibility of non-ideality analysis, such as read disturb analysis, in high-level simulations. However, current high-level simulators do not include a model for read disturb, and they are not suitable for read disturb analysis.

To fill this gap, this thesis presents RdaCIM, a read-disturb-aware CIM simulator. RdaCIM is high-level simulation tool that exploits parallelism provided by multi-core CPUs and AVX instructions. More importantly, RdaCIM involves the non-trivial non-ideality of read disturb into the simulations, which makes it possible for the user to perform read disturb related investigations on the simulator.

Experiments have been done with RdaCIM to show the feasibility of read disturb analysis on this tool. The effectiveness of a rewriting scheme as a countermeasure to read disturb is verified on RdaCIM. Furthermore, an effort to reduce the overhead of rewriting by dynamic voltage adjustment is presented and verified with RdaCIM. Performance benchmarks have been done to elaborate the benefits of the parallelised implementation of the simulation tool. ...

Brain-inspired feature extraction for near sensor extreme edge processing with Spiking Neural Networks

Master thesis (2024) - A.F. Dobriţa, S. Hamdioui, Manolis Sifalakis, Amirreza Yousefzadeh, A.B. Gebregiorgis, Simon Thorpe, C. Frenkel

Motivated by the desire to bring intelligent processing at the Edge, enabling online learning on resource- and latency-constrained embedded devices has become increasingly appealing, as it has the potential to tackle a wide range of challenges: on the one hand, it can deal with on-the-fly adaptation to fast sensor-generated streams of data under changing environments and on the other hand, it can address a variety of challenges associated with offline training in the cloud, such as incurred energy consumption of sensor data transfers and extra memory storage for the training samples, but also data privacy and security concerns. Concurrently, maintaining low-latency and power-efficient inference is paramount for edge AI computing systems, and thus learning/adapting online with minimal incurred overhead is crucial.

In this work, we propose EON-1, an Edge ONline Learning SCNN (Spiking Convolutional Neural Network) processor with 1-bit synaptic weights, 1-spike per neuron and 1-neuron updated per input, which we have benchmarked for both ASIC and FPGA platforms. Our key contribution is proposing a binary and stochastic SDTP rule which, benchmarked in an ASIC node, achieves less than 1% energy overhead for inference. To our knowledge, our solution incurs the least energy overhead for inference, compared to state-of-the-art solutions, showing a better efficiency by at least a factor of 10x. We also report 94% and 77.65% accuracy on the MNIST and Fashion-MNIST classification tasks, and we achieve 0.09pJ/SOP and 1.5pJ/SOP energy efficiency during inference and learning, respectively. We extend our solution to demonstrate a practical use-case of performing inference in real-time UHD videos while coping with streaming data and we showcase 60 FPS UHD video processing. ...

Evaluation of computation-in-memory using traditional (SRAM) and emerging non-volatile devices (memristors)

Master thesis (2024) - W.W.W. Sewnarain, A.B. Gebregiorgis, S. Hamdioui, C. Gao

Modern computer application require large amounts of data processing. Traditional computing models involve constant data transfer between memory and processor. This data transfer is a major contributor to high energy consumption. As these applications scale, the energy demand increases. This poses challenges in terms of sustainability and operational costs. Computation In Memory (CIM) integrates processing within the memory. This reduces the need for data transfer between memory and processor. Potential for drastically lowering energy consumption.

CIM macros are often implemented using modified SRAM cells, though recent literature explores memristor-based CIM designs due to the memristor’s low-energy, non-volatile characteristics. However, no comprehensive comparisons between SRAM-based and memristor-based CIM designs exist. While memristor-based designs are hypothesized to be more energy-efficient, this has not yet been proven.

This thesis compares SRAM-based and memristor-based CIM designs to determine which is better suited for CIM applications. This has been achieved by exploring the state of the art of memristive devices, memristor based CIM macros and SRAM based CIM macros. A selection of designs were chosen to compare, including the 1T1R and 8T SRAM design, which are the most popular memristor based and SRAM based CIM designs. The schematics of all the designs were recreated and simulated using as much of the same parameters as possible in all of the designs. A simulation of performing the logic AND and the MAC operation was made. Additionally a layout of the designs was made to extract the area. The designs were compared based on area, energy consumption and delay.

From the results could be concluded that the best device for CIM depends on the application. The memristor design had the smallest area and consumed the least amount of energy for reading, logic and MAC operations. The memristor design also consumed the most amount of energy during writing and the delay for all operations is longer than with the SRAM based designs. If area, energy consumption and delay are equally important for an application, then memristor based CIM would be the better choice only if there are much more logic/read operations than write operations. It could be the better choice for MAC operations if a more energy efficient ADC was used than the one used in this thesis. ...

Modern computer application require large amounts of data processing. Traditional computing models involve constant data transfer between memory and processor. This data transfer is a major contributor to high energy consumption. As these applications scale, the energy demand increases. This poses challenges in terms of sustainability and operational costs. Computation In Memory (CIM) integrates processing within the memory. This reduces the need for data transfer between memory and processor. Potential for drastically lowering energy consumption.

CIM macros are often implemented using modified SRAM cells, though recent literature explores memristor-based CIM designs due to the memristor’s low-energy, non-volatile characteristics. However, no comprehensive comparisons between SRAM-based and memristor-based CIM designs exist. While memristor-based designs are hypothesized to be more energy-efficient, this has not yet been proven.

This thesis compares SRAM-based and memristor-based CIM designs to determine which is better suited for CIM applications. This has been achieved by exploring the state of the art of memristive devices, memristor based CIM macros and SRAM based CIM macros. A selection of designs were chosen to compare, including the 1T1R and 8T SRAM design, which are the most popular memristor based and SRAM based CIM designs. The schematics of all the designs were recreated and simulated using as much of the same parameters as possible in all of the designs. A simulation of performing the logic AND and the MAC operation was made. Additionally a layout of the designs was made to extract the area. The designs were compared based on area, energy consumption and delay.

From the results could be concluded that the best device for CIM depends on the application. The memristor design had the smallest area and consumed the least amount of energy for reading, logic and MAC operations. The memristor design also consumed the most amount of energy during writing and the delay for all operations is longer than with the SRAM based designs. If area, energy consumption and delay are equally important for an application, then memristor based CIM would be the better choice only if there are much more logic/read operations than write operations. It could be the better choice for MAC operations if a more energy efficient ADC was used than the one used in this thesis.

Real-Time Detection and Classification of Purkinje-Cell Neural Activity

Master thesis (2023) - D.A. Vrijenhoek, S. Hamdioui, A.B. Gebregiorgis, M.A. Siddiqi, D.G. Muratore, C. Strydis

Purkinje cell is a type of neuron that can be found in the cerebellum. What characterises Purkinje cell neural activity is the fact that it exhibits two types of spiking behaviour; the so-called simple and complex spikes. These two types of spikes are thought to play a role in motor functionality. In order to better understand the relationship between Purkinje cell neural activity and the motor-cortex, neuroscientists record such neural activity in mice. However, current experimental setups pose a challenge as they involve a wired connection between the animal’s head stage and the recording device, which limits the mouse’s natural behaviour by restricting its movement. This work proposes a lightweight neural-spike detection and classification architecture for acquiring Purkinje cell neural activity. The proposed design discards unneeded information, by detecting and classifying spikes in real-time. This type of compression enables data storage on a removable device in the head stage, freeing mice from wires. Its small formfactor allows unrestricted movement during experiments, while a power-efficient design ensures long-termoperation. The performance of the algorithm has been evaluated using a software implementation, yielding a combined accuracy for detection and classification ranging from 92.74% to 94.54%. The system has been synthesised using the 45 nm Nangate Open Cell library resulting in an ASIC with an area of 0.22mm2 and a power consumption of 0.412mW. ...

Memristor-Based Encryption For Free-Floating Neural Implants

Master thesis (2022) - J.A. Galvan Hernández, M.A. Siddiqi, A.B. Gebregiorgis

The recent advances in the semiconductor industry have given rise to the development of highly scalable, wireless and battery-free neural-implant interfaces that enable brain monitoring and brain stimulation with high spatial and temporal resolution. Such implants are referred to as Free-Floating Neural Implants (FFNI), as the small size and untethered communication allow them to be scattered throughout the cortex. Nevertheless, the plethora of proposed interfaces have failed to mention and act against the potential security implications that may arise in highly-constrained FFNIs even though the U.S. Food and Drug Administration (FDA) has recently acknowledged the possibility of short-/long-range attacks on wireless Implantable Medical Devices (IMD). Hence, in this project, the existing threats in FFNIs are revealed, followed by the proposal of a memristor-based lightweight security approach to secure intracranial electromagnetic transmissions whilst considering the anticipated physical limitations of these constrained topologies. More specifically, a consolidated envisioned system is highlighted for which a read-only GIFT cipher is implemented. This lightweight encryption block primarily consists of a One-Transistor-One-Memristor (1T1R) crossbar structure for carrying out operations such as Substitution, Permutation, and addRoundKey, without destroying the resistive states and by only performing ‘read’ operations to maintain low power operation. With a footprint of 0.0034 mm2 the 1T1R-GIFT cipher reaches an average power and energy consumption of only 60.38 µW and 241.52 pJ, respectively. However, the performance does not exceed a CMOS-based implementation yet, whose footprint is similar but has roughly half the average power and energy consumption. This can be attributed mainly to the immaturity of the memristor technology. This work demonstrates that only after further advancements in memristor logic gates, crossbar topologies and fabrication processes, highly-constrained FFNIs can fully benefit from the scalable memristor-based security paradigm. ...

The recent advances in the semiconductor industry have given rise to the development of highly scalable, wireless and battery-free neural-implant interfaces that enable brain monitoring and brain stimulation with high spatial and temporal resolution. Such implants are referred to as Free-Floating Neural Implants (FFNI), as the small size and untethered communication allow them to be scattered throughout the cortex. Nevertheless, the plethora of proposed interfaces have failed to mention and act against the potential security implications that may arise in highly-constrained FFNIs even though the U.S. Food and Drug Administration (FDA) has recently acknowledged the possibility of short-/long-range attacks on wireless Implantable Medical Devices (IMD). Hence, in this project, the existing threats in FFNIs are revealed, followed by the proposal of a memristor-based lightweight security approach to secure intracranial electromagnetic transmissions whilst considering the anticipated physical limitations of these constrained topologies. More specifically, a consolidated envisioned system is highlighted for which a read-only GIFT cipher is implemented. This lightweight encryption block primarily consists of a One-Transistor-One-Memristor (1T1R) crossbar structure for carrying out operations such as Substitution, Permutation, and addRoundKey, without destroying the resistive states and by only performing ‘read’ operations to maintain low power operation. With a footprint of 0.0034 mm2 the 1T1R-GIFT cipher reaches an average power and energy consumption of only 60.38 µW and 241.52 pJ, respectively. However, the performance does not exceed a CMOS-based implementation yet, whose footprint is similar but has roughly half the average power and energy consumption. This can be attributed mainly to the immaturity of the memristor technology. This work demonstrates that only after further advancements in memristor logic gates, crossbar topologies and fabrication processes, highly-constrained FFNIs can fully benefit from the scalable memristor-based security paradigm.

The implementation of an MBAN gateway

Bachelor thesis (2021) - J.J.A. Vermeulen, T. Benaich, S. Hamdioui, R.K. Bishnoi, A.B. Gebregiorgis

Epilepsy is a severe neurological disorder that affects every aspect of a patient’s life. Unfortunately, there is no complete cure for everyone on the market yet. However, a lot of work has been done on seizure prevention. The entire project details the proof of concept implementation of a secure and reliable MBAN (Medical Body Area Network) used for seizure prevention. The principal objective of the MBAN system is to set up and maintain secure connections between the nodes of the MBAN system and store and analyze the received data in the cloud. Therefore, a suitable gateway is needed, which is created in this work. The gateway concerns a mobile application constructed with the Flutter SDK. The main ability of the applicaton is to communicate with the implantable medical device, which in the demonstration is the node the gateway is connected to using BLE. The application is designed for Android and iOS and is connected to the AWS cloud service in which the data is stored and analyzed with a simple function that checks whether the received heart rate is above a certain threshold. This function can be easily replaced by a more extensive function. In addition, the application displays user health metrics such as the heart rate, connection state, and it can update the firmware of the implantable medical device. The security measures taken in this project concern setting up the BLE connection with an OOB (Out Of Band) channel for key sharing, after which the key is used to encrypt the data streams. Additionally, the data in the cloud is encrypted. ...