

## A Convoluted Journey from CMOS to Spin Waves

Anagnostou, Pantazis; Van Zegbroeck, Arne; Hamdioui, Said; Adelmann, Christoph; Ciubotaru, Florin; Cotofana, Sorin

**DOI**

[10.1109/ISCAS56072.2025.11043168](https://doi.org/10.1109/ISCAS56072.2025.11043168)

**Publication date**

2025

**Document Version**

Final published version

**Published in**

Proceedings of the 2025 IEEE International Symposium on Circuits and Systems (ISCAS)

**Citation (APA)**

Anagnostou, P., Van Zegbroeck, A., Hamdioui, S., Adelmann, C., Ciubotaru, F., & Cotofana, S. (2025). A Convoluted Journey from CMOS to Spin Waves. In *Proceedings of the 2025 IEEE International Symposium on Circuits and Systems (ISCAS)* (Proceedings - IEEE International Symposium on Circuits and Systems). IEEE. <https://doi.org/10.1109/ISCAS56072.2025.11043168>

**Important note**

To cite this publication, please use the final published version (if applicable).  
Please check the document version above.

**Copyright**

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

**Takedown policy**

Please contact us and provide details if you believe this document breaches copyrights.  
We will remove access to the work immediately and investigate your claim.

**Green Open Access added to [TU Delft Institutional Repository](#)  
as part of the Taverne amendment.**

More information about this copyright law amendment  
can be found at <https://www.openaccess.nl>.

Otherwise as indicated in the copyright section:  
the publisher is the copyright holder of this work and the  
author uses the Dutch legislation to make this work public.

# A Convolved Journey from CMOS to Spin Waves

Pantazis Anagnostou\*, Arne Van Zegbroeck\*, Said Hamdioui\*,  
Christoph Adelmann†, Florin Ciubotaru† and Sorin Cotofana\*

Email: P.A.Anagnostou@tudelft.nl

\*Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology,

2600 AA Delft, The Netherlands

†IMEC, 3001 Leuven, Belgium

**Abstract**—In recent years, Spin Waves (SWs) have emerged as a promising CMOS alternative technology, and SW interference-based majority gates have been proposed and experimentally realized. In this paper, we pursue a different computation avenue and introduce a SW device able to evaluate  $2 \times 2$  2D convolution, which is a fundamental element for the implementation of Convolutional Neural Networks (CNNs). Assuming that the window pixels are  $P = [p_1, p_2; p_3, p_4]$  and the kernel is  $K = [k_1, k_2; k_3, k_4]$  we introduce a device which evaluates the convolution result  $\sum_{i=1}^4 p_i k_i$  within the SW domain by leveraging SWs inherent mechanisms, i.e., information encoding in SW amplitude and phase, SW amplitude decay due to Gilbert damping, SW interference. After introducing the SW device structure we demonstrate its proper behaviour by means of micromagnetic simulations. We also present power consumption, area, and delay estimates and argue that due to the fact that our proposal does not rely on standard adders and multipliers, it can substantially outperform traditional CMOS-based convolution implementations.

## I. INTRODUCTION

Over the past decades, numerous scientific and technological domains have experienced significant theoretical and experimental progress. One of those fields that have demonstrated great potential is spintronics [1]–[3], which is considered a promising avenue for extending or complementing, and eventually replacing, Complementary Metal-Oxide Semiconductor (CMOS) [4], [5] technology. A promising pathway for ultra-low-power spintronics computation involves embracing the propagating disturbances in the ordering of a magnetic material, known as Spin Waves (SWs), as data carriers [6], [7]. Their unique properties, e.g., frequency range from GHz to THz, wavelengths down to the atomic scale, pronounced non-linear and non-reciprocal phenomena, low-energy data transport and processing [8]–[10], offer a variety of advantages towards building SW based nanotechnologies. As such, SWs provide the means for interference-based computation and exhibit substantial potential for enabling ultra-low power computation [11]–[13].

While many ways to let SW carry information exist, state-of-the-art SW-based computing relies on relative phase-based information encoding: a relative phase of  $0^\circ$  (i.e., a spin wave in phase with a reference) refers to a logic 0 and a relative phase of  $180^\circ$  corresponds to a logic 1. By encoding information in phase difference and allowing an odd number of the same wavelength ( $\lambda$ ) and amplitude SWs to propagate into the same waveguide, a majority voting can be implemented

(in phase SWs interfere constructively and out of phase SWs destructively). This principle stands behind the implementation of the 3-input Majority Gate (*MAJ3*) [14], [15], whose output value is determined by the phase difference of the SW resulted from the interference of the 3 SW inputs. Given that *MAJ3* and Inverter (*INV*), realized by simply repositioning the gate output reading transducer (antenna) to  $\pm \frac{\lambda}{2}$  and hence changing the phase difference of the signal, form a universal gate set, any Boolean circuit can be implemented by means of SWs interaction. However, while *MAJ3* gates have been practically demonstrated in the nm range [16] many hurdles exist on the road from SW gate to circuit, e.g., gate cascading [17], [18] and fan-out achievement [19], [20], which for the time being preclude the design and practical implementation of functionally significant circuits within the SW domain, thus the full utilization of the ultra-low-power potential of the SW-based computing paradigm.

On the other hand, spintronics have also been utilized for neuromorphic computing, which seeks to emulate, in hardware, the way the human brain processes data and makes decisions. Recent advancements in this field have been driven by spintronic-specific phenomena, including domain walls, skyrmions, and other magnetic effects [21], [22]. Specifically, SWs have played a key role in advancing neuromorphic computing, contributing to the implementation of Deep Neural Network (DNN) architectures, e.g., multilayer [23], convolutional [24], and recurrent networks [25].

Inspired by these recent advancements, in this paper, we focus on a special DNN class, Convolutional Neural Networks (CNNs), and investigate SW technology's potential to speed up convolution calculations. We propose a  $2 \times 2$  2D convolution accelerator device that leverages SWs intrinsic properties to evaluate the required multiplications and additions by means of SWs interferences. We rely on a mix (amplitude and relative phase) encoding for input data representation and make use of SW amplitude attenuation [8], [18], [26] to represent the kernel values. We design, verify, and prove the correct behaviour of the  $2 \times 2$  2D convolution block by means of micromagnetic simulations. By relying on SW-specific properties, we achieve increased computation efficiency and reduced hardware complexity as we eliminate the need for adders and multipliers, components critical for the proper functionality of a CMOS implementation of a convolution block counterpart.

This paper is organized as follows: In Section II, we

present the theory of convolution computation in deep neural networks. In Section III, we introduce the concept of the SW-based convolution device and its verification, and in Section IV, we discuss the results and implications of our proposal. We conclude the paper with a few remarks and by outlining potential future research avenues.

## II. CONVOLUTION THEORY

Convolutional Neural Networks (CNNs) are the backbone of modern DNNs [27] due to their efficiency in feature extraction from datasets. Their applications [28] range from image and video recognition to natural language processing models, rendering their optimization and accuracy critical to the field. In image processing, 2D convolution layers (Conv2D) are utilized to extract image features, e.g., object edges, based on predefined kernels, each emphasizing a specific feature. The output features are then passed to the following DNN layers, possibly other Conv2D layers, until the final output layer is reached and the CNN decision is issued.

For the 2D convolution of an image, a specific process is repeated across each color channel over the entire image matrix, where a kernel window passes along it to extract features. The values, or weights, of the kernel window determine the output of the 2D convolution, examples of which include blurring of the image, edge detection, and many more. Initially, a pixel window with a size equal to the kernel window is selected in the image matrix. Each color value in the pixel window is then multiplied by the respective kernel weight and summed up with the rest according to Equation (1) by means of 9 multiplications and 8 additions.

$$\begin{array}{ccc} \text{Pixels} & & \text{Kernel} \\ \begin{bmatrix} p_1 & p_2 & p_3 \\ p_4 & p_5 & p_6 \\ p_7 & p_8 & p_9 \end{bmatrix} & \begin{bmatrix} k_1 & k_2 & k_3 \\ k_4 & k_5 & k_6 \\ k_7 & k_8 & k_9 \end{bmatrix} & = \sum_i p_i \cdot k_i \end{array} \quad (1)$$

The accumulated value becomes the new color value of the convoluted image matrix pixel, as depicted in Figure 1. This single process is repeated for all the pixels of the original image matrix, with the kernel window gliding along it from left to right and top to bottom. The new matrix has reduced dimensions and contains the extracted features based on the nature of the utilized kernel.



Fig. 1. Image Convolution Process

As suggested by Figure 1, the convolution process requires a number of Equation (1) not data-dependent weighted sum

evaluations, which can be performed in parallel. From the other point of view, larger kernels are of interest as they are more powerful and can diminish the number of CNN convolution stages. As previously mentioned, for a  $3 \times 3$  kernel, 9 multiplications and 8 additions are needed, while for larger kernel sizes, the requirements increase significantly — for instance, a  $5 \times 5$  kernel necessitates 25 multiplications and 24 additions. Thus, the availability of fast, low-cost multipliers and adders is essential for CNN implementations' performance. In the next section, we introduce a device able to evaluate weighted sums by means of SW interactions.

## III. SW BASED CONVOLUTION

The main idea behind our proposal is to compute the convolution result (the weighted sum) by exploiting the intrinsic SW properties instead of by means of conventional adders and multipliers. For the sake of simplicity, we utilize a  $2 \times 2$  kernel as a discussion vehicle to introduce the approach, but our proposal can be extended to accommodate larger kernel sizes. As graphically depicted in Figure 2, we make use of a magnetic conduit and generate 4 SWs with amplitudes  $a_i \propto p_i, i = 1, 4$  by means of RF transducers (antennas) located at  $w_i \propto k_i, i = 1, 4$  away from the device output  $O$ . While the SWs travel from their generation point towards the output  $O$ , their amplitudes are diminished due to waveguide material-dependent Gilbert damping [8], [26] and interact constructively/destructively if in phase/out of phase. Thus, by choosing  $a_i$  and  $w_i$  that properly reflect the image and kernel values, the SW detectable at  $O$  represents the convolution result  $\sum_{i=1}^4 a_i w_i$ .



Fig. 2.  $2 \times 2$  2D Convolution SW Device

To implement the proposed concept, particular attention should be directed towards embedding input/color values,  $p_i$ , and kernel weights,  $k_i$ , into SW amplitudes,  $a_i$ , and distances,  $w_i$ , respectively. Encoding pixel color values is achieved by modulating the generation fields' strengths across the antennas, which control the initial SW amplitude. For instance, an input value of 1 corresponds to 1 mT, 2 to 2 mT, and so on. It is essential to note that the field strengths should remain moderate, with the upper limit varying depending on the device configuration, to prevent spin flipping, as this would disrupt the sine-like SW motion. To link antenna distances to the output port to the kernel weights, we need to conduct a simulation where a single RF transducer is placed on top of the magnetic conduit, capturing amplitude decay as the wave travels along it. This simulation enables the construction of an attenuation profile, an example of which is presented in

Figure 3, reporting amplitude ratios  $\frac{A_{w_i}}{A_0}$ , where  $A_0$  and  $A_{w_i}$  represent the SW amplitude at the antenna region and at a distance  $w_i = i \times \lambda$ ,  $i$  being an integer, from it. These amplitude reduction ratios define the available values for kernel weight selection. Negative kernel weights are implemented by adjusting the respective input antenna's phase from  $0^\circ$  to  $180^\circ$ , thus switching interference at  $O$  from constructive to destructive. Therefore, by placing the 4 antennas at specific distances from the output port, each input can be effectively multiplied (reduced) by its respective kernel weight, producing the weighted terms,  $a_i w_i = p_i k_i$ , at  $O$ , whose interference yields the final result.



Fig. 3. SW Amplitude Decay

We verify the proposed concept by means of `mumax3` [29] micromagnetic simulations. Firstly, we derive the dispersion relation, presented in Figure 4, for a waveguide with the dimensions and properties indicated in Table I. We selected CoFeB as waveguide material due to its relatively high damping parameter,  $\alpha$ , which induces faster SW amplitude decay and results in a compact device. Naturally, the dispersion relation will change if the material properties or waveguide parameters are modified, affecting the propagation characteristics of spin waves. Therefore, a thorough investigation of the dispersion relation is essential before the actual implementation to ensure the desired device performance.



Fig. 4. Dispersion Relation

TABLE I  
SIMULATION PARAMETERS

| L     | W      | T     | M <sub>s</sub>         | A <sub>ex</sub>       | $\alpha$           | B <sub>ext</sub> | k <sub>anis</sub>      |
|-------|--------|-------|------------------------|-----------------------|--------------------|------------------|------------------------|
| 20 μm | 200 nm | 20 nm | 1.2 MA m <sup>-1</sup> | 18 pJ m <sup>-1</sup> | $4 \times 10^{-3}$ | 60 mT            | 0.9 MJ m <sup>-3</sup> |

From the obtained dispersion relation, we chose the excitation frequency for the SWs at  $f = 8$  GHz, which results in a wavelength,  $\lambda = 500$  nm. By placing and exciting only one antenna, we obtain the amplitude attenuation profile as described above, presented in Figure 5, and thus the available values for the kernel weights selection.



Fig. 5. SW Amplitude Attenuation Profile

We test three convolution cases, detailed in Table II, using randomly selected kernel values from the attenuation profile and varying input values to simulate different scenarios. We positioned antennas at the required distances from the output port per the attenuation profile, with field strength and SW phase adjusted to match the input values.

TABLE II  
MICROMAGNETIC SIMULATIONS CONVOLUTION EXAMPLES

| Case #    | Pixel Window                                              | Kernel Window                                                                                                                                     | Ground Truth |
|-----------|-----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| Example 1 | $p = \begin{bmatrix} 4 & 2 \\ 1 & 3 \end{bmatrix}$        | $k = \begin{bmatrix} 0.2254 & 0.5969 \\ 0.5072 & 0.3113 \end{bmatrix} = \begin{bmatrix} 9\lambda & 3\lambda \\ 4\lambda & 7\lambda \end{bmatrix}$ | 3.5365       |
| Example 2 | $p = \begin{bmatrix} -1 & 2 \\ -1.5 & 2.5 \end{bmatrix}$  | $k = \begin{bmatrix} 0.5969 & 0.4311 \\ 0.2647 & 0.3113 \end{bmatrix} = \begin{bmatrix} 3\lambda & 5\lambda \\ 8\lambda & 7\lambda \end{bmatrix}$ | 0.6465       |
| Example 3 | $p = \begin{bmatrix} -3 & 0.5 \\ -0.75 & 1 \end{bmatrix}$ | $k = \begin{bmatrix} 0.5072 & 0.1922 \\ 0.3662 & 0.8261 \end{bmatrix} = \begin{bmatrix} 4\lambda & 10\lambda \\ 6\lambda & \lambda \end{bmatrix}$ | -0.87405     |

Based on the simulation results, we derive the output signals' time-dependent propagation for each case, as illustrated in Figure 6, to calculate the amplitude ratio of each signal over the reference one, when a field of 1 mT is utilized for SW excitation, and hence the result of the convolution. In Table III, we present the ground truth of the convolution computations, the obtained values from our simulations, and each case's respective absolute and relative errors. The proposed SW convolution device provides an average relative error of 4.45%, which is a level of accuracy often acceptable in neural networks where perfect precision is not always required. Thus, micromagnetic simulations confirm the proper operation of the proposed SW 2×2 convolution block.

TABLE III  
SIMULATION RESULTS AND ERRORS

| Example # | Ground Truth | Obtained Result | Absolute Error | Relative Error (%) |
|-----------|--------------|-----------------|----------------|--------------------|
| Example 1 | 3.5365       | 3.6791          | 0.1426         | 4.03               |
| Example 2 | 0.6465       | 0.6753          | 0.0288         | 4.45               |
| Example 3 | -0.87405     | -0.9167         | 0.04265        | 4.88               |



Fig. 6.  $2 \times 2$  Convolution Block Output Temporal Behavior

#### IV. RESULTS & IMPLICATIONS

Following the successful verification of the proposed device, it is of interest to evaluate its cost and performance. The proposed SW device comprises a magnetic conduit for SWs propagation, 4 RF antennas, 1 Readout port, and external CMOS circuitry to generate RF signals for the antennas (based on image input values) and read the convolution result. As the design of the CMOS circuit is outside the paper's scope, we cannot precisely evaluate the area, delay, and power consumption of the proposed device. However, we can do some estimates by considering that the SW generator (Magnetolectric (ME) cell transducer) exhibits a delay of 0.42 ns and consumes 14.4 aJ while the SW reading consumes 2.7 fJ and introduces a delay of 0.03 ns [30], [31]. The convolution gate delay is determined by SW generation delay, the time it takes to the most faraway SW to reach  $O$ , which in our case is 8 ns, and the SW reading, and sums up to 8.45 ns. Based on these, the power consumption of the device is 0.3263  $\mu$ W and the area, which is determined by the dimensions of the magnetic conduit, amounts to  $4 \mu\text{m}^2$ .

On the other hand, a CMOS-based convolution device counterpart requires at least 1 multiplier and 1 adder, or 4 multipliers and 3 adders for fast evaluation. The area, delay, and power consumption of those arithmetic units depend on the format and precision required for image and kernel values representation, but they are obviously complex blocks. Examining the single-precision floating-point (SP FP) format for data encoding, metrics for both multiplier and adder components can be evaluated. For the multiplier, [32] reports a delay of 9.71 ns, a power consumption of  $2055.7 \mu\text{W}$ , and an area of  $7997.3 \mu\text{m}^2$  for a precise SP FP multiplier in 45 nm CMOS technology. Regarding the adder, [33] provides metric values for both accurate and approximate adder configurations. For precise computations, we reference the exact adder block in the same CMOS technology, which consumes  $1027 \mu\text{W}$ , exhibits a delay of 2.827 ns and occupies an area of  $2415.72 \mu\text{m}^2$ . Utilizing the accelerated evaluation of convolution computation results in a total power consumption of  $11\,303.8 \mu\text{W}$ , a delay of 15.364 ns, and an area footprint of  $39\,236.36 \mu\text{m}^2$ .

Based on the comparative analysis presented in Table IV, we

conclude that our device demonstrates superior performance across all evaluated metrics relative to its CMOS counterpart. This advantage is particularly pronounced in terms of area and power consumption, where the values for our device are rendered nearly negligible in comparison.

TABLE IV  
COMPARISON OF CONVOLUTION BLOCK IN CMOS AND SW DOMAINS

| Technology | Topology                                 | Power ( $\mu\text{W}$ ) | Area ( $\mu\text{m}^2$ ) | Delay (ns) |
|------------|------------------------------------------|-------------------------|--------------------------|------------|
| CMOS       | $4 \times MUL + 3 \times ADD$            | 11303.8                 | 39236.36                 | 15.364     |
| SW         | $4 \times RF + 1 \times SW \text{ Read}$ | 0.3263                  | 4                        | 8.45       |

Extending the comparison to parallel computing further accentuates the advantages of our device. In such settings, multiple device instances are employed to simultaneously compute different parts of the final result. For the CMOS-based device, utilizing  $n$  instances would necessitate  $n \times 4$  multipliers and  $n \times 3$  adders. Considering the metrics for power consumption, delay, and area presented above for a single instance of the 2 technologies, our device presents a significant advantage. The configuration in both domains, CMOS and SW, for parallel computing, is depicted in Figure 7.



Fig. 7. SW vs CMOS Parallel  $2 \times 2$  Convolution

#### V. CONCLUSION

In this paper, we initially discussed the theory and applications of convolution computation within state-of-the-art DNNs, emphasizing the extensive utilization of multipliers and adders. Subsequently, we proposed a novel SW-based convolution block that leverages the unique SW properties to perform convolution computation without relying on conventional adder and multiplication mechanisms. After we introduced the SW device, we validated its correct behavior by means of micro-magnetic simulations. We concluded by comparing our device with its CMOS counterpart in terms of power consumption, area, and delay and argued that by not relying on standard adders and multipliers, it is very compact, consumes almost negligible power, and can substantially outperform traditional CMOS-based convolution implementations.

#### ACKNOWLEDGMENT

This work was funded by European Union, Horizon Europe programme under grant agreement 101070417 (SPIDER project). For the purpose of Open Access the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

## REFERENCES

[1] A. V. Chumak, V. Vasyuchka, A. Serga, and B. Hillebrands, "Magnon spintronics," *Nature Physics*, vol. 11, no. 6, pp. 453–461, Jun. 2015.

[2] A. Hirohata, K. Yamada, Y. Nakatani, I.-L. Prejbeanu *et al.*, "Review on spintronics: Principles and device applications," *Journal of Magnetism and Magnetic Materials*, vol. 509, p. 166711, Sep. 2020.

[3] A. V. Chumak, P. Kabos, M. Wu, C. Abert *et al.*, "Advances in Magnetics Roadmap on Spin-Wave Computing," *IEEE Transactions on Magnetics*, vol. 58, no. 6, pp. 1–72, Jun. 2022.

[4] D. E. Nikonorov and I. A. Young, "Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking," *Proceedings of the IEEE*, vol. 101, no. 12, pp. 2498–2533, Dec. 2013.

[5] A. Mahmoud, N. Cucu-Laurenciu, F. Vanderveken, F. Ciubotaru *et al.*, "Would Magnonic Circuits Outperform CMOS Counterparts?" in *Proceedings of the Great Lakes Symposium on VLSI 2022*. Irvine CA USA: ACM, Jun. 2022, pp. 309–313.

[6] A. V. Chumak, "Fundamentals of magnon-based computing," 2019.

[7] A. Mahmoud, F. Ciubotaru, F. Vanderveken, A. V. Chumak *et al.*, "Introduction to spin wave computing," *Journal of Applied Physics*, vol. 128, no. 16, p. 161101, Oct. 2020.

[8] D. D. Stancil and A. Prabhakar, *Spin Waves: Theory and Applications*. Boston, MA: Springer US, 2009.

[9] M. Jamali, J. H. Kwon, S.-M. Seo, K.-J. Lee *et al.*, "Spin wave nonreciprocity for logic device applications," *Scientific Reports*, vol. 3, no. 1, p. 3160, Nov. 2013. [Online]. Available: <https://www.nature.com/articles/srep03160>

[10] A. V. Chumak, A. A. Serga, and B. Hillebrands, "Magnonic crystals for data processing," *Journal of Physics D: Applied Physics*, vol. 50, no. 24, p. 244001, Jun. 2017.

[11] R. Nakane, A. Hirose, and G. Tanaka, "Spin waves propagating through a stripe magnetic domain structure and their applications to reservoir computing," *Physical Review Research*, vol. 3, no. 3, p. 033243, Sep. 2021.

[12] A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann *et al.*, "Spin Wave Based Approximate Computing," *IEEE Transactions on Emerging Topics in Computing*, vol. 10, no. 4, pp. 1932–1940, Oct. 2022.

[13] R. M. Menezes, J. Mulkers, C. C. d. S. Silva, B. Van Waeyenberge *et al.*, "Towards Magnonic Logic and Neuromorphic Computing: Controlling Spin-Waves by Spin-Polarized Current," 2023.

[14] S. Klingler, P. Pirro, T. Brächer, B. Leven *et al.*, "Design of a spin-wave majority gate employing mode selection," *Applied Physics Letters*, vol. 105, no. 15, p. 152410, Oct. 2014.

[15] T. Fischer, M. Kewenig, D. A. Bozhko, A. A. Serga *et al.*, "Experimental prototype of a spin-wave majority gate," *Applied Physics Letters*, vol. 110, no. 15, p. 152401, Apr. 2017.

[16] F. Ciubotaru, G. Talmelli, T. Devolder, O. Zografos *et al.*, "First experimental demonstration of a scalable linear majority gate based on spin waves," in *2018 IEEE International Electron Devices Meeting (IEDM)*. San Francisco, CA: IEEE, Dec. 2018, pp. 36.1.1–36.1.4.

[17] A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru *et al.*, "Spin Wave Normalization Toward All Magnonic Circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 68, no. 1, pp. 536–549, Jan. 2021.

[18] P. Anagnostou, A. Van Zegbroeck, S. Hamdioui, C. Adelmann *et al.*, "Spin Wave Majority Gates Cascading by Gilbert Damping Embracement (Can the Devil be Turned into an Angel?)," in *2024 IEEE 24th International Conference on Nanotechnology (NANO)*. Gijon, Spain: IEEE, Jul. 2024, pp. 610–614.

[19] A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru *et al.*, "Fan-out enabled spin wave majority gate," *AIP Advances*, vol. 10, no. 3, p. 035119, Mar. 2020.

[20] A. Mahmoud, C. Adelmann, F. Vanderveken, S. Cotofana *et al.*, "Fan-out of 2 Triangle Shape Spin Wave Logic Gates," in *2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)*. Grenoble, France: IEEE, Feb. 2021, pp. 948–953.

[21] J. Grollier, D. Querlioz, K. Y. Camsari, K. Everschor-Sitte *et al.*, "Neuromorphic spintronics," *Nature Electronics*, vol. 3, no. 7, pp. 360–370, Mar. 2020.

[22] C. H. Marrows, J. Barker, T. A. Moore, and T. Moorsom, "Neuromorphic computing with spintronics," *npj Spintronics*, vol. 2, no. 1, p. 12, Apr. 2024.

[23] A. Papp, W. Porod, and G. Csaba, "Nanoscale neural network using nonlinear spin-wave interference," *Nature Communications*, vol. 12, no. 1, p. 6422, Nov. 2021.

[24] A. Fülöp, G. Csaba, and A. Horváth, "A Convolutional Neural Network with a Wave-Based Convolver," *Electronics*, vol. 12, no. 5, p. 1126, Feb. 2023.

[25] M. Hibat-Allah, M. Ganahl, L. E. Hayward, R. G. Melko *et al.*, "Recurrent neural network wave functions," *Physical Review Research*, vol. 2, no. 2, p. 023358, Jun. 2020. [Online]. Available: <https://link.aps.org/doi/10.1103/PhysRevResearch.2.023358>

[26] M. C. Hickey and J. S. Moodera, "Origin of Intrinsic Gilbert Damping," *Physical Review Letters*, vol. 102, no. 13, p. 137601, Mar. 2009.

[27] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," *Nature*, vol. 521, no. 7553, pp. 436–444, May 2015.

[28] D. Bhatt, C. Patel, H. Talsania, J. Patel *et al.*, "CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope," *Electronics*, vol. 10, no. 20, p. 2470, Oct. 2021.

[29] A. Vansteenkiste, J. Leflaert, M. Dvornik, M. Helsen *et al.*, "The design and verification of MuMax3," *AIP Advances*, vol. 4, no. 10, p. 107133, Oct. 2014.

[30] O. Zografos, P. Raghavan, L. Amaru, B. Soree *et al.*, "System-level assessment and area evaluation of Spin Wave logic circuits," in *2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)*. Paris, France: IEEE, Jul. 2014, pp. 25–30.

[31] O. Zografos, P. Raghavan, Y. Sherazi, A. Vayset *et al.*, "Area and routing efficiency of SWD circuits compared to advanced CMOS," in *2015 International Conference on IC Design & Technology (ICICDT)*. Leuven, Belgium: IEEE, Jun. 2015, pp. 1–4.

[32] P. Yin, C. Wang, W. Liu, and F. Lombardi, "Design and performance evaluation of approximate floating-point multipliers," in *2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, 2016, pp. 296–301.

[33] H. Saleh and E. E. Swartzlander, "A floating-point fused add-subtract unit," in *2008 51st Midwest Symposium on Circuits and Systems*, 2008, pp. 519–522.