## Spin Wave Circuit Design Mahmoud, A.N.N. 10.4233/uuid:0c6cf4ab-f297-426f-9a4f-455f07f1f643 **Publication date** **Document Version** Final published version Citation (APA) Mahmoud, A. N. N. (2022). Spin Wave Circuit Design. [Dissertation (TU Delft), Delft University of Technology]. https://doi.org/10.4233/uuid:0c6cf4ab-f297-426f-9a4f-455f07f1f643 Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. ## **SPIN WAVE CIRCUIT DESIGN** ## SPIN WAVE CIRCUIT DESIGN ### Dissertation for the purpose of obtaining the degree of doctor at Delft University of Technology by the authority of the Rector Magnificus prof. dr. ir. T.H.J.J. van der Hagen chair of the board for Doctorates to be defended publicly on Tuesday 14 June 2022 at 12:30 o'clock by ## Abdulqader Nael Nathmi MAHMOUD Master of Science in Electrical and Computer Engineering Khalifa University, United Arab Emirates born in Kalba, United Arab Emirates. This dissertation has been approved by the promotors. promotor: prof. dr. ir. S. Hamdioui promotor: dr. S. Cotofana #### Composition of the doctoral committee: Rector Magnificus, chairperson Prof. dr. ir. S. Hamdioui Delft University of Technology, promotor Dr. S. Cotofana Delft University of Technology, copromotor *Independent members:* Prof. dr. ir. W. D. van Driel Delft University of Technology Prof. dr. A. Chumak University of Vienna, Austria Jun. prof. dr. P. Pirro Technical University of Kaiserslautern, Germany Dr. F. Ciubotaru IMEC, Belgium Dr. ir. M. Taouil Delft University of Technology Prof. dr. ir. R. E. Kooij Delft University of Technology, reserve member Keywords: Spin wave, logic gate, circuit, fanout, cascading, adder, multiplier, ac- curate computing, approximate computing, digital computing, analog computing, parallelism, wavepipelining, energy, area. Printed by: Gildeprint Printing, the Netherlands Front & Back: designed by Alaa Bani Nemra & Abdulqader Mahmoud Copyright © 2022 by A. Mahmoud ISBN 978-94-6419-525-5 An electronic version of this dissertation is available at https://repository.tudelft.nl/. ## **ACKNOWLEDGEMENTS** First of all, I would like to express my gratitude to prof. dr. Baker Mohammad (Khalifa University, UAE) who recommended me for my promotor prof. dr. ir. Said Hamdioui. Prof. Dr. Baker Mohammad taught me how to start the research, write and present new ideas during my master degree, and he introduced me to prof. dr. ir. Said Hamdioui while we were all attending VLSI-SoC 2017 conference in Abu Dhabi. During the conference, I had many discussions with prof. dr. ir. Said Hamdioui and just later I discover that there were part of informal interview. After couple of months, I was invited by prof. dr. ir. Said Hamdioui and his colleagues to a formal interview which was followed by the decision of my acceptance in his group in TU Delft. I would like to thank prof. dr. ir. Said Hamdioui for giving me this chance to work in a different professional environment and to work with well-known company as IMEC. I would also thank him for his continuous support and help during different stages of my PhD, not only professional help but also personal help, support and advises. Without both of you, I could never be part of TU Delft and achieve this; thank you very much! I would like to thank my thesis supervisor and co-promoter dr. ir. Sorin Cotofana for his continuous support and encouragement during all phases of the thesis. My co-promoters guided me toward the right path of the research, and they were available for any question or concern that I had. They taught me how to think, how to find reliable sources, how to present and write, how to work under pressure and manage my time. I took much of their time, and I hope they will be proud of what I will achieve. Thank you very much! I would also like to thank IMEC team Giacomo Talmelli, Frederic Vanderveken, Florin Ciubotaru and Christoph Adelmann for their continuous help, support and guidance. I learnt many things from you including understanding the fundamentals of spin wave, micromagneic simulations, and experimental work. We spent a great time working together, and I was lucky to work with this great team. I hope they enjoyed it as I did. In addition, I would like to thank them in addition to Federica Luciano, Daniele Narducci, and Léa Richard for their help and support during my internship at IMEC, we had fruitful discussions and journey in the fabrication and RF laboratories. Thank you! I would like to thank as well CHIRON partners Dr. Philipp Pirro (TU Kaiserslautern), Prof. Burkard Hillebrands (TU Kaiserslautern), Dr. Thibaut Devolder (U Paris Sud), Dr. Silvia Matzen (U Paris Sud), Dr. Umesh Bhaskar (U Paris Sud), Dr. Madjid Anane (CNRS), Dr. Paolo Bortolotti (Thales), ir. Thomas Aukes (Solmates), Dr. George Konstantinidis (FORTH Heraklion), Dr. Qi Wang (TU Vienna) and Dr. Alexandru Muller (IMT Bucharest). for many valuable discussions. I would also like to acknowledge Dr. Mottaqiallah Taouil and Dr. Ioan Lager help, and I am gratefully indebted to them for their very valuable comments on this thesis. In addition, I would like to thank Andrii Chumak (TU Vienna) and Alexander Khitun (University of California Riverside) for their replies and feedback on my technical questions at the beginning of my PhD. Thank you! Although I did not have lunch many times with QCE colleagues, but I enjoyed the times that we were together in the university and outside, especially the gathering near EWI building, the barbecues, and dinner in the Brazilian restaurant. Thank you my friends and colleagues Moritz, Daniel, Mottaqiallah Taouil, Jintao, Lizhou, Anh, Guilherme, Mahdi, Michael, Arwa, Abid, Cezar, Abdullah, Haji, Troya, Mark, and Abhairaj. Best wishes to all of you! I would like to thank the managerial team of our department Paul, Trisha, Laura, Joyce, and Lidwina for their continuous help, and support and organizing different events. I would like to thank Guilherme and the MEST team for organizing the barbecue, drinks, and other gatherings, I really enjoyed them. Thank you very much! I would like to mention our QCE indoor football which has taken place every week. I enjoyed it very much and I enjoyed playing with the colleagues from EEMCS. Thanks for the department for organizing such activity and sport weekly. Special thanks to my friends Muath, Mohammad Hamed, Hani, Maruan, Khubaib, Mohammad Saad, Mohammad Fathi, Mostafa, Mouhannad, Emad, Kathem, Abdullah, Mahdi, Ali, Sharaf, Abid, Mohsen, Ali, Alghendoor, Said, Tanveer, Omar, Gorab, AlAlfy, Amr, Bakr, Moatasim, Mostafa, Mohsen, Yousef, Medina, Lamis, Mamoun, Mohammad, Mohaned Sousi, Mohaned Abu Nada, Alaa, and Karam for their continuous support and motivation. We enjoyed a lot our times together including the gatherings, chatting, and eating delicious food. I has not forgotten throughout my journey my lovely country Palestine which I forced to miss for ages, this thesis and all my work is dedicated to my lovely country Palestine. I want to thank the Palestinian community in the Netherlands including Wathiq AlSaadeh and the Palestinian representative employees for their continuous help and support. Finally, I must express my very profound gratitude to my parents, Nael Mahmoud and Maha Assi, for providing me with the continuous support and encouragement not only throughout my PhD journey but throughout my life. This accomplishment would not be possible without them and no word can describe their impact in my life. Also, I would like to thank my brothers, Ahmed and Osaid, and sisters, Aseel and Tasneem, and their families for their continuous support and help. Lastly, I would like to express my gratitude to my wife Alaa who supported me, encouraged me, and made my life more beautiful. Thanks my love! Thank you very much all of you and I wish you all the best in your personal and professional life, and hope to meet again in the future. Best regards and wishes, Abdulqader Mahmoud Delft, May 2022. The Netherlands. ## **ABSTRACT** CMOS downscaling has provided the means to efficiently process the huge raw data resulted from the information technology revolution. However, this becomes more difficult because of leakage, reliability, and cost walls. To keep the pace with the exploding market needs at affordable cost, novel alternative technologies are under investigation; one of them is Spin Wave (SW), which is the collective excitation of the electron spins in the ferromagnetic materials. SW stands apart as one of the most promising avenues because of its ultra-low energy consumption and high scalability. This thesis: a) develops and designs spin wave based logic gates and circuits, and b) investigates the requirements for spin wave technology to outperform CMOS technology from energy efficiency point of view. **Logic gate:** SW circuit design requires the availability of SW logic gates to possess fan-out capabilities. Therefore, we propose and validate novel fan-out enabled spin wave logic gates including (N)AND, (N)OR, X(N)OR, and majority gates. In addition, we present and validate novel *n*-bit multi-frequency data parallel spin wave logic gates, i.e., SWs with different frequencies propagate in the same waveguide while interfering with similar frequency SWs only. Moreover, we examine a SW 3-input Majority gate working under continuous and pulse mode operation regimes. Furthermore, we present and validate how pulse mode operation enables Wave Pipelining (WP) within SW. **Circuits:** We develop, design, and validate three major circuits; namely an adder, a multiplier, and a compressor. These make use of SW gate cascading. Firstly, we introduce and validate SW accurate and approximate full adders; the approximate full adder consumes 55% less energy than the accurate full adder but it has 25% error rate making it suitable for error tolerant applications. We also propose a non-binary SW computing paradigm which we use to build a non-binary SW adder. Then we develop SW accurate and approximate 4:2 compressor; the approximate compressor consumes 46% less energy than the accurate compressor but it has 31% error rate. Finally, we design 2-bit inputs accurate and approximate multiplier; the approximate multiplier consumes 64% less energy than the accurate multiplier but it has 25% error rate. **SW Technology Requirements:** We are interested in assessing the technological development horizon that needs to be reached to make SW circuits outperform CMOS counterparts in terms of energy efficiency. We perform a reverse engineering alike analysis to determine transducer delay and power consumption upper bounds that can place SW circuits in the leading position. To this end, we compute the maximum transducer delay and power consumption of a 32-bit Brent-Kung adder that could potentially enable a SW implementation able to outperform its 7 nm CMOS counterpart. Our evaluations indicate that 31 nW is the maximum transducer power consumption for which a 32-bit Brent-Kung SW implementation can outperform its 7 nm CMOS counterpart in term of energy efficiency. ## **SAMENVATTING** CMOS-schaling heeft de middelen opgeleverd om de enorme hoeveelheid onbewerkte gegevens die het resultaat zijn van de informatietechnologierevolutie efficiënt te verwerken. Dit wordt echter moeilijker vanwege lekstromen, betrouwbaarheid en oplopende kosten. Om gelijke tred te houden met de exploderende marktbehoeften tegen betaalbare kosten, worden nieuwe alternatieve technologieën onderzocht; een daarvan is een spingolf (SG). Een SG is de collectieve excitatie van de elektronenspins in ferromagnetische materialen. SG's zijn een van de meest veelbelovende opties vanwege het extreem lage energieverbruik en de hoge schaalbaarheid. Dit proefschrift: a) ontwikkelt en ontwerpt op SG gebaseerde logische poorten en circuits, en b) onderzoekt de vereisten voor SG-technologie om energie-efficiënter te presteren dan CMOS. **Logische poort:** Om SG-circuits te ontwerpen is het nodig dat de uitgangspoorten van logische SG-poorten belast kunnen. We introduceren en valideren nieuwe SG-poorten die een hoge uitgangsbelastbaarheid hebben, zoals (N)EN-, (N)OF-, EX(N)OF- en meerderheidspoorten. Daarnaast presenteren en valideren we nieuwe n-bit multi-frequentiële data parallelle SG-logische poorten, d.w.z. dat SG's met verschillende frequenties zich in dezelfde golfgeleider voortplanten terwijl ze alleen interfereren met SG's met een vergelijkbare frequentie. Bovendien onderzoeken we een drie-ingangs SG-meerderheidspoort die zowel in continue als in pulsmodus werkt. Verder presenteren en valideren we hoe de pulsmodus golfpipelining voor SG's mogelijk maakt. Circuits: We ontwikkelen, ontwerpen en valideren drie belangrijke circuits; namelijk een opteller, een vermenigvuldiger en een compressor. Deze maken gebruik van SG-poortcascadering. Ten eerste introduceren en valideren we exacte en benaderende volledige SG-optellers; de benaderende volledige opteller verbruikt 55% minder energie dan de exacte volledige opteller, maar heeft een foutenpercentage van 25% waardoor het geschikt is voor fouttolerante toepassingen. Daarnaast introduceren we een niet-binair SG-computerparadigma dat we gebruiken om een niet-binaire SG-opteller te bouwen. Vervolgens ontwikkelen we een exacte en benaderende 4:2-SG-compressor; de benaderende compressor verbruikt 46% minder energie dan de exacte compressor, maar heeft een foutenpercentage van 31%. Tenslotte ontwerpen we een exacte en benaderende vermenigvuldiger met 2-bits ingangen; de benaderende vermenigvuldiger verbruikt 64% minder energie dan de exacte vermenigvuldiger, maar heeft een foutenpercentage van 25%. Vereisten voor SG-Technologie: We willen de technologische ontwikkelingshorizon beoordelen die moet worden bereikt om SG-circuits beter te laten presteren dan CMOStegenhangers in termen van energie-efficiëntie. We voeren een reverse engineeringachtige analyse uit om de bovengrenzen van de transducervertraging en het stroomverbruik te bepalen die SG-circuits in de leidende positie kunnen plaatsen. Hiertoe berekenen we de maximale transducervertraging en het stroomverbruik van een 32-bits Brent-Kung-opteller die potentieel een SG-implementatie mogelijk maakt die beter kan xii Samenvatting presteren dan zijn CMOS-tegenhanger geproduceerd in 7 nm-technologie. Onze evaluaties geven aan dat 31 nW het maximale energieverbruik van de transducer is waarvoor een 32-bits Brent-Kung SG-implementatie beter presteert dan zijn CMOS-tegenhanger geproduceerd in 7 nm-technologie op het gebied van energie-efficiëntie. ## **CONTENTS** | Ac | knov | vledge | ments | vii | |----|---------------------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------| | AŁ | strac | ct | | ix | | Sa | men | vatting | 5 | хi | | 1 | 1.1<br>1.2<br>1.3<br>1.4<br>1.5 | Spin V<br>Resea<br>Thesis | on luction to Spin Wave Computing | 4<br>7<br>9 | | 2 | 2.1 | Physic<br>2.1.1<br>2.1.2<br>2.1.3<br>Funda<br>2.2.1<br>2.2.2<br>2.2.3<br>2.2.4<br>2.2.5<br>2.2.6 | and and State-of-the-art as of spin waves Magnetization and magnetic interactions Magnetization dynamics and spin waves Nonlinear spin-wave physics mentals of spin-wave computing. Basic computer architectures Information Encoding How to compute with (spin) waves? Spin-wave interconnects. Spin-wave memory Hybrid spin-wave—CMOS computing systems al Spin Wave Device Structure Excitation Cell | 16<br>18<br>24<br>25<br>25<br>26<br>27<br>30<br>31<br>31 | | | 2.4 2.5 | 2.3.2<br>2.3.3<br>2.3.4<br>Direct | Waveguide | 34<br>34<br>35<br>38<br>39<br>42<br>43<br>44<br>44 | | | | 2.5.7<br>2.5.8 | Towards quantum magnonics | 45 | xiv Contents | | | 2.5.9 Micro | | | _ | | | | | | | | | | | | |---|------|---------------------------|----------------------------------|----------|---------|----------|---------|------|-------|-----|------|------|------|-----|----|------------| | | 26 | 2.5.10 Antife Conclusions | | _ | | | | | - | | | | | | | | | _ | | | | | | | | | • • | • • | • | | • | • • | • | | | 3 | | out Enable Sp | | | | | | | | | | | | | | 49<br>. 50 | | | 3.1 | Ladder Shap 3.1.1 Fanou | | | | | | | | | | | | | | . 50 | | | | 5.1.1 Failut | it Ellableu | Spiii wa | ive maj | Officy a | iiu i i | Ugra | 11111 | пар | 16 1 | JUB. | ic c | Jai | CS | | | | | | ation Setu | p and Re | esults | | | | | | | | | | | . 52 | | | 3.2 | Triangle Sha | | | | | | | | | | | | | | | | | | 3.2.1 Fanou | | | | | | | | | | | | | | | | | | 3.2.2 Simul | ation Setu | p and Re | esults | | | | | | | | | | | . 58 | | | 3.3 | Performance | e Evaluatio | n | | | | | | | | | | | | . 59 | | | 3.4 | Conclusions | | | | | | | | | | | | | | . 61 | | 4 | Spir | Wave Data F | Parallelism | 1 | | | | | | | | | | | | 63 | | | 4.1 | <i>n</i> -bit Data Pa | | | ate | | | | | | | | | | | . 64 | | | 4.2 | Simulation S | | | | | | | | | | | | | | | | | 4.3 | Performance | e Evaluatio | n | | | | | | | | | | | | . 72 | | | 4.4 | Conclusions | | | | | | | | | | | | | | . 75 | | 5 | Spir | Wave Wavep | ipeline | | | | | | | | | | | | | 77 | | | 5.1 | Spin Wave M | - | te Opera | ation M | Iode | | | | | | | | | | . 78 | | | | 5.1.1 CMO | | | | | | | | | | | | | | | | | | 5.1.2 <b>Simul</b> | ation Setu | p and Re | esults | | | | | | | | | | | . 80 | | | | 5.1.3 Perform | | | | | | | | | | | | | | | | | 5.2 | Wave Pipelin | _ | | | | • | | | | | | | | | | | | | 5.2.1 Wave | | | | | | | | | | | | | | | | | | | ation Setu | | | | | | | | | | | | | | | | - 0 | 5.2.3 Perfor | | | | | | | | | | | | | | | | | 5.3 | Conclusions | | | • • • | | | | | | ٠ | | • | | ٠ | . 86 | | 6 | Spir | Wave Norma | | | | | | | | | | | | | | 87 | | | 6.1 | Spin Wave G | | | | | | | | | | | | | | | | | 6.2 | Spin Wave F | | | | | | | | | | | | | | | | | | - | Wave Full A | | | | | | | | | | | | | | | | | 6.2.2 Simul | | | | | | | | | | | | | | | | | C 0 | 6.2.3 Perfor | | | | | | | | | | | | | | | | | 6.3 | Spin Wave 4: | | | | | | | | | | | | | | | | | | | <i>N</i> ave 4:2 C<br>ation Setu | | | | | | | | | | | | | | | | | | rmance Ev | | | | | | | | | | | | | | | | 6.4 | Gate Cascad | | | | | | | | | | | | | | | | | 0.1 | | tional Cou | | | | | | | | | | | | | | | | | | ided In-Lir | | | | | | | | | | | | | | | | | | Cascaded | | | | | | | | | | | | | | | | | | lly Cascad | | | | | | | | | | | | | | CONTENTS xv | | 6.5 | 6.4.6<br>6.4.7 | 2-bit Inputs Spin Wave Multiplier | <br> | | . 108<br>. 113 | |-----|--------------------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|--------------|-----------------------------------------------------| | 7 | 7.1 | SW App<br>7.1.1<br>7.1.2<br>7.1.3<br>SW 4:2 | Approximate Computing Droximate Full Adder SW Approximate Full Adder Structure Simulation Setup and Results Performance Evaluation Compressor SW Approximate 4:2 Compressor Structure | <br> | <br><br><br> | . 118<br>. 119<br>. 120<br>. 121 | | | 7.3<br>7.4 | 7.2.2<br>7.2.3<br>SW App<br>7.3.1<br>7.3.2<br>7.3.3 | Simulation Setup and Results Performance Evaluation proximate 2-bit inputs Multiplier. SW Approximate 2-bit inputs Multiplier Structure Simulation Setup and Results Performance Evaluation sions. | <br> | <br> | . 123<br>. 125<br>. 127<br>. 128<br>. 129<br>. 131 | | 8 | 8.1<br>8.2<br>8.3<br>8.4 | Conver<br>Non-bi<br>Simula<br>Perform | Spin Wave Computing Approach ntional Spin Wave Computing | <br> | <br><br> | . 135<br>. 139<br>. 141 | | 9 | 9.1 | SW Tra<br>9.1.1<br>9.1.2<br>9.1.3 | hmarking of Spin Wave Technology nsducer Power Upper Bound Possible Implementations Brent-Kung Adder Choice Transducer Power Upper Bound sions. | <br> | <br><br> | <ul><li>. 145</li><li>. 147</li><li>. 148</li></ul> | | 10 | 10.1 | | <b>s</b><br>ary | | | | | No | men | clature | | | | 161 | | Cu | rricu | ılum Vi | tæ | | | 165 | | Lis | st of I | Publicat | tions | | | 167 | 1 ## INTRODUCTION - 1.1. Introduction to Spin Wave Computing - 1.2. Spin Wave Circuit Design Challenges - 1.3. THESIS CONTRIBUTIONS - 1.4. THESIS ORGANIZATION This chapter introduces the thesis field, states the research questions, and presents its contributions. Spin Wave SW computing is an emerging paradigm that makes use of wave interaction instead of charge movements. SW is the collective excitations of the electron spins in the ferromagnetic material, and it is one of the most promising technologies because of its ultra-low energy consumption in computing and high scalability. Most of state-of-the-art effort was focused on devices and little attention has been given to circuits, consequently we focus on moving from SW device to SW circuit. To do this, we identify and formulate the research question. We first discuss the current computing systems paradigm which depends on charges movement and why researchers have to move to explore other technologies. We continue by providing the state-of-the-art of the SW logic gates and circuits followed by by explaining the SW circuit design challenges. After that, we illustrate the formulated research questions targeting to solve the spin wave circuit design challenges. Next, we explain the thesis contribution in solving the spin wave circuit design and enabling SW circuits. Finally, we provide the thesis organization. 2 1. Introduction #### 1.1. Introduction to Spin Wave Computing Current computing systems rely on paradigms, in which information is represented by electric charge or voltage, and computation is performed by charge movements. The fundamental circuit element in this framework is the transistor, which can serve both as a switch and an amplifier. Today's large-scale integrated circuits are based on complementary metal-oxide-semiconductor (CMOS) field-effect transistors because of low power consumption, low fabrication cost, and they can be fabricated in high density [1]–[3]. Using CMOS transistors, logic gates can be built that perform a full set of Boolean algebraic operations. Efficient implementation of Boolean operations is fundamental for the design of mainstream logic circuits, and, together with charge-based memory devices, of computing systems [4], [5]. After the introduction of the CMOS technology into mainstream production in 1974, the device density and the performance have been steadily improved by geometric Dennard scaling [6], following the famed Moore's law [7]. This progress has been orchestrated first in the USA by the national technology roadmap for semiconductors, and after 1998, worldwide by the international technology roadmap for semiconductors (ITRS) [8]. This has allowed CMOS technology to simultaneously drive and respond to an exploding information technology market. Today, CMOS has clearly consolidated its leading position in the digital domain. In the last two decades, to sustain Moore's law, CMOS scaling has increasingly required the introduction of disruptive changes of CMOS transistor and circuit architectures beyond Dennard scaling [9], [10], including Cu interconnects [11], high- $\kappa$ dielectrics [12], or the FINFET architecture [13]. However, CMOS scaling is expected to decelerate [14] mainly due to unsustainable power densities, high sourcedrain and gate leakage currents [15], [16], reduced reliability [17], and economical inefficiency [15], [17]. Yet, despite the slowdown, Moore's law and CMOS scaling are not expected to end in the next decade. The roadmap for future developments is summarized in the International Roadmap for Devices and Systems (IRDS) [18]. For many years, Moore's law (especially the threat of its end) continuation effort has been accompanied by research on alternative computing paradigms to further improve computation platforms performance beyond the CMOS horizon [18]-[28]. Recently, this has accelerated due to a surge of interest in non-Boolean computing approaches for machine learning applications [29]-[31] based on devices with transistor functionality (e.g., tunnel FETs) [32] or alternative devices (e.g., memristors) [33], [34]. Amongst all beyond-CMOS approaches, spintronics, which uses magnetic degrees of freedom instead of electron charge for information coding [35]-[40], has been identified as particularly promising due to the low intrinsic energies of magnetic excitations as well as their collective nature [25]-[27], [41], [42]. Numerous implementations of spintronic Boolean logic devices have been proposed based on magnetic semiconductors [43], individual atomic spins [44], spin currents [45]-[47], nanomagnets [48]-[52], domain walls [53]-[55], skyrmions [56], [57], or spin waves [58]–[60]. While some approaches try to provide transistor-alike functionality [43], [45]-[47], [61], others aim at replacing Boolean logic gates rather than individual transistors [58]-[60], [62], [63]. Among the latter group spintronic, majority gates have received particular attention due to the expected simplification of logic circuit implementations [27], [59], [64], [65]. Note that while majority gates have been researched for decades [66], their CMOS implementation is inefficient and therefore have not been widely used in circuit design. However, the advent of compact (spintronic) majority gate implementations has recently led to a revival of majority-based circuit synthesis [64], [67], [68]. One of the most energy efficient spintronic technology relies on the voltage driven generation and manipulation of Spin Waves [26], [59], [69], [70]. Spin Waves (SWs) are oscillatory collective excitations of magnetic moments in ferromagnetic or antiferromagnetic materials [71]–[73]. It stands apart as one of the most promising spintronics avenues [25], [26], [60], [69], [70], [74]–[80] because (i) it has ultra-low energy consumption as the electrons are spinning and not moving, (ii) it is highly scalable as the SW wavelength, which is the only scalability limitation, can reach down to few nanometers, (iii) it has an acceptable delay, (iv) it has natural support for parallelism feature as SWs with different frequencies can simultaneously propagate through the same waveguide without affecting each other [70], [79]. Driven by this potential, different logic gates and circuits have been suggested [58], [59], [62], [75], [80]–[110], and in the sequel we briefly present some of them. A current controlled Mach-Zender interferometer based NOT gate has been the first experimentally demonstrated SW logic gate [58], and by making use of a similar method, other logic gates including XNOR, NAND, and NOR were realized [62], [81], [82]. NOT, OR, and AND gates were designed using three terminal devices with transmission lines [83]-[86], and voltage-controlled XNOR and NAND gates utilizing re-configurable nano-channel magnonic devices were suggested [87]. In addition, an XOR gate was proposed by embedding magnon transistors between the Mach-Zehnder interferometer arms [88]. By relying on another information encoding method, i.e., on SW phase rather than on SW amplitude as it is the case for the previously mentioned schemes, buffer, NOT, (N)AND, (N)OR, XOR, and Majority gates were introduced in [59]. Moreover, alternative Majority gate designs were suggested to decrease the SW back propagation and increase the SW transmission efficiency [89]-[91]. OR and NOR gates were designed using cross structures [92] and physically implemented Majority gates were reported in [93]-[96]. Moreover, multi-frequency spin wave logic gates were explained and utilized to enable parallelism in the SW domain [80]. In addition, $\mu$ m range multiplexer [105] and mm range prototypes were demonstrated [98]-[101], [111]. Worth mentioning is the mm range prototyping of Magnonic Helographic Memory (MHM) [98], [100] and its potential utilization for parallel data processing [106]-[108], [112]. Reversible SW based logic gates were also proposed [111] and the concept was used to build an AND gate and comparator. Furthermore, different circuits have been also reported without simulation or experimental results [59], [102], [104], [109], [113]. Moreover, a multi-value magnon adder for all magnonic neurons that only operates in the presence of large external fields, which makes the design not scalable and energy hungry was illustrated in [110]. However, the aforementioned designs either: (i) cannot provide fan-out support, or (ii) rely on the assumption that cascading can be performed straightforward without providing actual solutions for it (which is not correct), or (iii) are not scalable and energy hungry. This makes the aforementioned designs inappropriate for designing SW based energy efficient, and scalable circuits. In the following section, we detail the challenges one has to face when moving from the utilization of the SW technology at the device (gate) to the circuit level. 4 1. Introduction Figure 1.1: SW Gate Interconnect. Figure 1.2: Metal/Optics Interconnect. #### 1.2. SPIN WAVE CIRCUIT DESIGN CHALLENGES While in computation paradigms relying on charge transport, e.g., CMOS based Boolean logic, the way from gate level to circuit level is smooth and determined by fabrication technology capabilities and limitations, this is certainly not the case in the SW domain. In the remainder of this section we discuss the main hurdles on this road, which are related to gate interconnection, input-output consistency, and fan-out achievement. The design and realization of SW based circuits require, apart of the availability of SW based gate structures, communication means to allow for data exchange between gates, and if applicable for local and/or global synchronization. In traditional digital Integrated Circuits (ICs), the Boolean logic values 0 and 1 are voltage encoded, which allows for data and clock transmission via metal wires. Thus, gate interconnection and clock distribution solutions are quite mature and well understood from the point of view of their capabilities and associated overhead. However, SW gates operate on SW inputs which interference produces a SW output that cannot be directly transmitted to other following gates by means of metal wires. A straightforward approach for connecting SW gates could be by means of waveguides such that SW gate outputs can be directly utilized as inputs to following gates, as depicted in Figure 1.1. Even though this approach seems natural, it substantially adds to the overall circuit delay as SWs propagate rather slow through waveguides. The actual interconnection delay overhead is waveguide length and material dependent, and it can range from 30 ps to 7.14 ns per $\mu$ m, which impedes the utilization of this method for long range interconnects [76]. Moreover, when a SW propagates through a waveguide, its strength is affected by the damping phenomenon, which makes it useless as next gate input if it travels beyond the free path limit, which is also material dependent and for state-of-the-art waveguide materials, can range from $3.9 \,\mu \text{m}$ to $14.1 \,\text{mm}$ [76]. Thus, longer than free path, waveguide-based interconnects have to make use of repeater(s) to regenerate the propagating SWs or amplifier(s) to amplify the propagating SWs. Alternatively, metal/optic interconnects can be utilized as suggested in Figure 1.2, which exhibit less propagation delay than the waveguides as current and light are traveling very fast through metal wires and optical fiber, respectively. However, such an approach implies back and forth conversions between SW and voltage/optic domains by 1 means of transducers at the expense of substantial delay and energy consumption overheads. As SWs are created by charge spinning around the magnetic field no actual charge is moving through the waveguide and it is generally considered that SW interaction and propagation within a waveguide doesn't consume energy (or at least it only consumes a tiny amount of it). This implies that the energy efficiency of the hybrid interconnect method is determined by transducer performance and, if applied at small granularity, i.e., gate level, transducers figure of merit dominates the energy consumption of the entire circuit. Thus, in order to take advantage of the SW logic gates "zero" energy consumption, metal/optic based interconnects should not be utilized for fine grain intergate communication. However, given that on metal and optic fibre data can travel on long distance without degradation, this method can be quite attractive for coarse grain interconnects between large SW computation blocks (islands). Based on this brief analysis we can conclude that for local interconnects between adjacent gates waveguides should be at hand while for long range interconnects metal wires or optical fiber are more appropriate. The domain granularity for switching between the two approaches depends on the relation between gate and transducers delay and energy consumption, and it is obviously technology dependent. SW amplitude, phase, and frequency can be utilized to encode information [74], [79], which processing is governed by the wave interference principle. An example of SW interference can be where SWs are interfering constructively if they have the same phase $\Delta\phi=0$ , and destructively if they are out-of-phase $\Delta\phi=\pi$ . Moreover, assuming SW phase information encoding, i.e., phase 0 and $\pi$ represents logic 0 and 1, respectively, SWs interaction supports Majority function evaluation. For instance, if 3 SWs having the same amplitude, frequency, and wavelength interfere in the same waveguide, the resultant SW has 0 phase, if at least 2 SWs have 0 phase, whereas the resultant SW has $\pi$ phase, if at least 2 SWs have a $\pi$ phase. Note that such an implementation in CMOS requires 18 transistors, whereas it can be directly implemented in SW technology with a single waveguide. One can easily deduce that more complex interference patterns can occur for SWs with different amplitude, frequency, wavenumber, and wavelength, which can be of great interest for developing future SW based computing paradigms. A general structure of the SW device consists of four main regions [74], [76]: i) Excitation region (I), ii) Waveguide (B), iii) Functional Region (FR), and iv) Detection region (O). At the excitation region, the SW is excited by means of voltage- or current- driven techniques such as microstrip antennas [74], MagnetoElectric (ME) cells [74], or spin orbit torques [74]. After the excitation, the SWs propagates through the magnetic waveguide and reach the functional region, where it can be manipulated, i.e., amplified, normalized, or interfere with other SWs. Finally, at the detection region, the SW is detected by similar or different methods than in the excitation region [74], [76]. The output is detected based on two main techniques: 1) phase detection, 2) threshold detection. Output based phase detection means that the output phase is compared with a predefined phase and $0/\pi$ phase difference means logic 0/1. In contrast, output based threshold detection means that if the output spin wave magnetization is greater than a predefined threshold, the output is logic 1, and it is logic 0, otherwise, then $XOR = MAJ(I_1, I_2, 0)$ . Moving from SW device to SW circuit is another main hurdle, which is related to gate input output consistency, which in CMOS is a non-issue as logic values 0 and 1 are rep- Figure 1.3: SW Gate Cascading Figure 1.4: SW Gate with Fanout of 2 resented by 0V and $V_{DD}$ , respectively, at gate inputs and outputs. Thus, gate outputs can directly drive other gates inputs without requiring any type of post processing. Unfortunately, this is not the case for SW gates operating on phase encoded information. SW interference happening in such a gate generates the correct output phase-wise but the output SW may have different strength (amplitude), i.e., strong SW, if the interference has been constructive (the interfering input waves have the same phases) or weak SW, if it was destructive (the interfering waves have different phases). For example, if two inputs of a SW majority gate are 0, while the third one is 1, a weak 0 SW (amplitude A) is generated while if all inputs are 0 a strong 0 SW output (amplitude 3A) is produced. Thus, if two majority gates are cascaded, the amplitude difference at the driving gate output can induce wrong results at the driven gate output, which has been designed to operate on amplitude A SW inputs and cannot properly accommodate a 3A SW input. For example if the first gate in Figure 1.3 is producing a strong 0 and the other inputs of the second gate are 1, its output will not be 1, as it should but 0, as the 0 SW input is dominant. Therefore, a certain mechanism for SW amplitude normalization is required at SW gate output in order to guaranty proper circuit behavior as indicated in Figure 1.3. Finally, the realization of any relevant circuit requires gate fanout capabilities as one gate output is often utilized as input for more than one gate, as depicted in Figure 1.4. In CMOS, fan-out achievement is straightforward as a gate output can be directly connected to following inputs by metal wires at the expense of some delay overhead due to a larger output capacitance, which can be dealt with by proper transistor dimensioning. Achieving fan-out in SW domain is not straight forward as it requires SW replication. For example, if a certain Majority gate has to provide its output to more than one following gate, it has to be replicated which, in a non toy circuit, results in an area explosion (if the output of a 32-bit adder is needed to be provided to two or more gates inputs, then the entire 32-bit adder must be replicated twice or more, depending on the required fanout value) and substantial energy consumption overhead. The previous discussion clearly indicates that in SW domain the road from gate to circuit is not as straightforward as in charge based computing. Therefore, in this thesis, we investigate and propose solutions for the elimination of these hurdles on the road. In the following section, we formulate research questions that tackle these hurdles and cover the thesis topics. ### 1.3. RESEARCH QUESTIONS This thesis address the following research question, which we formulate to cover the thesis topics, in general as: ## Can we build scalable energy efficient spin wave circuits able to match and potentially outperform state-of-the-art CMOS counterparts? In this section, we discuss this question which is followed by 8 subquestions derived from this question to provide a suitable answer for the research question. As previously explained, different logic gates have been suggested in the literature [58], [59], [62], [81]–[87], [89]–[96]. However, all of them have single output, which means that if this output is needed to drive multiple following gate inputs, it must be replicated multiple times, which generate energy and area overheads. Thus, spin wave logic gates must provide fanout capability in order to enable the design of efficient SW circuits. This leads us to the following research question: #### Can we enable energy and area effective fanout in spin wave logic gates? Related to this question, we investigate different avenues to design energy effective domain conversion free multi-output spin wave logic gates. Another design enabling factor is direct gate cascading, which is not supported by state-of-the-art SW gates. Given that, as briefly discussed in the previous section, SW gate cascading is not straightforward as for CMOS counterparts, as it requires domain conversion; this brings us to the following research question: ## Can we enable domain conversion free SW gates cascading while preserving correct gate functionality and minimizing energy consumption and area overheads? After addressing the two most fundamental hurdles on the way towards the realization of SW energy effective circuits we continue our investigations by considering SW technology specific phenomena that can further improve SW circuits performance. We start by observing that different frequency SWs can simultaneously propagate through the same waveguide without affecting each other, while only interfering with their own species. This opens the road towards data parallelism as, e.g., the evaluation of XOR(A,B), A and B being n-bit words, can be done with one instead of n XOR gates if each input pair (ai ,bi ) is encoded with fi frequency SWs. This approach has been pursued in [80], which introduces a Majority gate structure able to simultaneously process 3 data sets encoded at 3 different SW frequencies. However, the suggested structure contains a magnonic crystal that induces a large delay overhead. In addition, no investigation has been per- formed in an attempt to determine the upper bound of the achievable data parallelism, i.e., number if SW frequencies. Therefore, we investigate the following research question: ## How many frequencies can be utilized in parallel to store and process information while avoiding the use of magnonic crystal? Another interesting concept that seems naturally applicable within the SW based computing paradigm framework is wavepipelining [114], [115], which main idea is to allow for the coexistence and interference free handling of multiple data sets within a register free pipelined circuit. To be able to operate in such a manner the circuit has to be redesigned such that all its propagation paths exhibit the same delay. This guaranties that input sets do not interfere within the circuit and reach the output in their chronological order. While the utilization of this concept in combination with CMOS technology is rather limited we believe that it can be naturally combined with SW technology due to the very way information is encoded and processed in SW based circuits, which brings us to the following research question: #### Can we perform wavepipelining efficiently within the spin wave domain? Related to this question, we investigate different possibilities to enable efficient wave-pipelining in spin wave circuits. SW is excited by means of voltage driven or current driven transducers. If SW excitation is performed by the continuous application of voltages/currents to the input, which is usually the case, the overall energy consumption is determined by the transducer power and the circuit critical path delay, which leads to high energy consumption because of SWs slowness. However, if transducers are operated in pulses the energy becomes circuit delay independent and it is mainly determined by the transducer power and delay. In addition, wavepipelining requires SW generation by pulses as otherwise data waves cannot be created. Therefore, we explore the advantages and disadvantages of each operation mode, and determine the situations in which PMO/CMO is more appropriate in addition to the possibility of implementing efficient wavepipelining in pulse operation mode in the following research question: #### Which is the most energy efficient operation mode for the spin wave circuits? We further noticed that state-of-the-art all proposed SW gates and circuits [58], [59], [62], [75], [80]–[110] target accurate computing where exact results are delivered. However many applications like multimedia and image processing [116] are error tolerant and can benefit from approximate computing approaches. Thus, it is of interest to explore the possibility of implementing efficient spin wave approximate circuits, which brings us to the following research question: ## Can we utilize approximate computing concept to build efficient approximate spin wave circuits? While most of the suggested SW gates and circuits [58], [59], [62], [75], [80]–[110] rely on the Boolean algebra, non-Boolean algebra based logic circuits have been proposed within the context of other technologies [117]–[121]. Inspired by such approaches, and in an attempt to diminish fanout, cascading, and domain conversion overheads we get to the following research question: ## Can we utilize SW technology to build efficient non-Boolean algebra based logic circuits? To this end we explore the avenues towards developing non-binary computing SW circuits, and evaluate their potential by comparing them with Boolean algebra based counterparts. The last item we address in this thesis relates to the evaluation of the SW based computing potential to compete with the well established CMOS counterpart. However, given the early stage of SW technological development, it is difficult to provide a comprehensive performance assessment. Nevertheless, some preliminary estimates can be drawn and are certainly of great interest, which brings us to the following research question: ## Which is the transducer power consumption upper bound that makes spin wave circuits match and potentially outperforms CMOS counterparts? By answering these questions in this thesis, we build the way towards the implementation of scalable energy efficient spin wave circuits. #### 1.4. THESIS CONTRIBUTIONS In this thesis, we enable fanout and gate cascading without domain conversion, which opens the way toward designing efficient circuit in the spin wave domain. In addition, we achieve parallelism and wavepipelining in spin wave, which save area and increase the throughput. Furthermore, we introduce approximate computing in SW domain, which save much energy and area in the error-tolerant applications. Moreover, we develop a beyond Boolean algebra computing approach, and introduce a non-binary SW computing paradigm that enables full non-binary SW circuit design. Finally, we determine the maximum transducer power consumption for which SW implementations can potentially outperforms in term of energy its 7 nm CMOS counterpart. Our contribution in the thesis can be summarized as follows: - We introduce novel ladder and triangle shape spin wave majority gate devices that can achieve a fan-out of up to 4 and 2, respectively, and discusses how the ladder Majority can serve as a programmable logic gate and the triangle one as an XOR gate. The proposed designs are validated by means of OOMMF and MuMax3 micromagnetic simulations and compared with the state-of-the-art spin wave and 16 nm CMOS, counterparts. Our evaluation indicates that while 14x slower than the CMOS counterpart, the proposed ladder and triangle structures gate provide 9x and 10.5x energy consumption reduction, respectively. Moreover, due to their fanout capabilities, they also provide a 33% and 50% energy reduction, respectively, when compared with the state-of-the-art SW gates, without inducing any area or delay overhead. - We present a novel *n*-bit data parallel spin wave logic gate. In order to explain the proposed concept, we implement and validate by means of OOMMF, 8-bit 2-input XOR and 3-input Majority gates. Further, we propose an optimization algorithm to minimize the area overhead of the proposed multi-frequency gates and demonstrate that the algorithm diminishes the area by 30% and 41% for XOR and MAJ gates implementations, respectively. Moreover, to asses the potential of our proposal, we evaluate and compare the proposed multifrequency gates with functionally equivalent scalar SW gate based implementations in terms of area, delay, 1 and power consumption. The results indicate that the byte-based XOR and Majority gates require 4.47x and 4.16x area less than the conventional (scalar) implementations, respectively, at the expense of 5% to 7% delay overhead and without inducing any power consumption overhead. Finally, we demonstrate that, for current gate topology and materials, the maximum number of frequencies (gate parallelism) is 8 and 16 for phase and threshold based output detection, respectively. - We propose and validate by means of micromagnetic simulations a SW 3-input Majority gate under continuous and pulse mode operation regimes. We, also, evaluate the gate energy consumption and our results indicate that Pulse Mode Operation (PMO) diminishes the gate energy consumption by a factor of 18, when compared with the continuous mode operation. In addition, we present how PMO enables Wave Pipelining (WP) within SW circuits and validate WP on a 4 cascaded 3-input Majority gates circuit by means of micromagnetic simulations. Furthermore, we demonstrate that WP utilization improves the circuit throughput by 3.6x. - We introduce a novel energy efficient spin wave based Full adder (FA). The FA is implemented by making use of a Majority gate and 2 XOR gates. In the proposed FA, two main detection mechanisms are utilized: phase detection for the Carry-out output detection and threshold detection for the Sum output detection. The correct functionality of the FA is validated by means of micromagnetic simulations, and evaluated and compared with direct SW gate based implementation and five state-of-the-art technologies equivalent designs 22 nm CMOS, MTJ, SHE, DWM and Spin-CMOS. It is demonstrated that the proposed FA consumes 22.5%, and 43% less energy than direct SW gate based implementation and 22 nm CMOS, respectively and requires 3 orders of magnitude less energy in comparison with the state-of-the-art MTJ, SHE, DWM and Spin-CMOS based FA. Also, the proposed FA saves more than 22% less area in comparison with all designs. - We propose and validate by means of micro-magnetic simulation a novel 4-2 Spin Wave (SW) compressor. The proposed compressor is assessed and compared with the state-of-the-art SW, 22 nm CMOS, Magnetic Tunnel Junction (MTJ), Domain Wall Motion (DWM), and Spin-CMOS technologies. The evaluation result shows that the proposed compressor consumes 2.5x less energy than 22 nm CMOS counterpart. In addition, it outperforms the MTJ, DWM, and Spin-CMOS designs by at least 3 orders of magnitude. Moreover, it consumes 1.25x less energy than the conventional SW compressor. Furthermore, it achieves the smallest chip real-estate. - We introduce a directional coupler-based SW amplitude renormalization method, which allows for conversion free energy effective gate cascading. Three complex gates that cover the most common situations encountered in logic circuit implementations, and a 2-bit inputs spin wave multiplier have been presented and validated by means of micromagnetic simulations. Our results indicate that they are energy effective and potentially open the road towards the full utilization of SW paradigm capabilities and the development of SW only circuits. In particular, for the complex gates our method provides 20%-33% energy savings when compared with conversion based equivalent designs, and the proposed SW multiplier requires $6.25\times$ and 31% less energy in comparison with the $16\,\mathrm{nm}$ CMOS and conversion-based SW counterparts, respectively. - We propose and validate by means of micromagnetic simulations a novel approximate energy efficient spin wave based Full Adder (AFA). AFA is evaluated and compared with the state-of-the-art counterparts. AFA saves 43% and 33% energy when compared with the state-of-the-art SW and 7 nm CMOS, respectively, and 69% and 44% in comparison with accurate and approximate 45 nm CMOS, respectively. In addition, it saves more than 2 orders of magnitude when compared with accurate SHE, and accurate and approximate DWM, MTJ, and Spin-CMOS FAs. Moreover, it achieves the same error rate as approximate 45 nm CMOS and Spin-CMOS FA whereas it exhibits 50% less error rate than approximate DWM FA and requires at least 29% less chip real-estate in comparison with the other state-of-the-art designs. - We introduce a Spin Wave (SW) based 4:2 approximate compressor, which consists of 3-input and 5-input Majority gates. We report the design of approximate circuits without directional couplers, which are essential to normalize gate output(s) when cascading them in accurate circuit designs. Subsequently, we validate the proposed compressor by means of micromagnetic simulations, and compare it with the state-of-the-art SW, 22 nm CMOS, 45 nm CMOS, and Spin-CMOS counterparts. The evaluation results indicate that the proposed 4:2 compressor saves 31.5% energy in comparison with the accurate SW compressor, has the same energy consumption, and error rate as the approximate compressor with DC, but it requires 3x less delay. Moreover, it consumes 14% less energy, while having 17% lower error rate when compared with the approximate 45 nm CMOS counterpart. Furthermore, it outperforms the approximate Spin-CMOS based compressor by 3 orders of magnitude in term of energy consumption while providing the same error rate. Last but not least, the proposed compressor requires the smallest number of devices, thus it potentially requires the lowest chip real-estate. - We propose and validate by means of micromagnetic simulations a novel approximate energy efficient spin wave based 2-bit inputs multiplier (AMUL). AMUL design is evaluated and compared with the state-of-the-art counterparts. AMUL saves at least 2x and 5x energy in comparison with the state-of-the-art accurate SW designs and 16 nm CMOS accurate and approximate designs, respectively. Moreover, the AMUL has an average error rate of 25%, while the approximate CMOS MUL has an average error rate of 38%, and requires at least 64% less chip realestate. - We propose a novel non-binary SW computing paradigm where the information is encoded in spin wave amplitude and computing is performed in spin wave domain by the interference of different amplitude SWs. The result is detected at the outputs after a non-binary to binary conversion by means of the developed nonbinary to binary converter, which is made by utilizing multiple directional couplers. Subsequently, we design a spin wave non-binary adder by relying on the 12 1. Introduction 1 proposed computing paradigm and SW amplitude value converter, and validate its functionality by means of micromagnetic simulations. Also, we evaluate and compare a non-binary 2-bit adder with Boolean algebra based SW and 16 nm CMOS designs. The results indicate that our approach diminished the energy consumption by 3.14x and 37x when compared with the conventional SW and 16 nm CMOS counterparts, respectively. Furthermore, the proposed non-binary adder implementation requires the least number of devices, which indicates its potential for small chip real-estate realizations. • We have performed a reverse engineering alike analysis to determine ME delay and power consumption upper bounds that can place SW circuits in the leading position. We have utilized a 32-bit Brent-Kung Adder (BKA) as discussion vehicle and compute the maximum ME delay and power consumption that could potentially enable a SW implementation able to outperform its 7 nm CMOS counterpart. We evaluate different BKA SW implementations that rely on conversion- or normalization-based gate cascading and consider continuous or pulsed SW generation scenarios. Our evaluations indicate that 31 nW is the maximum transducer power consumption for which a 32-bit Brent-Kung SW implementation can outperform its 7 nm CMOS counterpart in terms of energy consumption. #### 1.5. THESIS ORGANIZATION The rest of the thesis is organized in 10 chapters as follows: - Chapter 2 starts with an introduction to the physics of spin waves. Subsequently, the computation paradigm based on spin waves is introduced and the fundamental requirements for the realization of spin-wave circuits are discussed. Next, we provide an overview of different spin-wave transducers and devices. This is followed by a discussion of an overview of the state-of-the-art of spin-wave technology. - Chapter 3 presents novel ladder and triangle shape spin wave majority gate device concepts that can achieve a fan-out of up to 4 and 2, respectively. Moreover, we provide inside on how the ladder Majority can serve as a programmable logic gate and the triangle one as an XOR gate. In addition, we discuss the validation of the proposed designs by means of OOMMF and MuMax3 micromagnetic simulations, and the comparison with the state-of-the-art designs. - Chapter 4 introduces the *n*-bit data parallel SW logic gate and introduces the associated area optimization algorithm, and presents simulation experiments related to the validation of the 8-bit 3-input Majority and 2-input XOR gates. In addition, it presents evaluation results for the two bytes wide parallel gates and a comparison with functional equivalent scalar implementations in addition to the maximum achievable parallelism issues. - Chapter 5 illustrates the SW Majority gate operation principle under continuous mode and pulse mode operation in addition to the validation by means of micromagnetic simulations, and evaluate its energy effectiveness under the two modes. It, also, discusses wavepipelining achievement in SW circuits, and the micromagnetic simulations results of the SW wavepipelining, and evaluates its throughput impact. - Chapter 6 provides the design, validation, and evaluation of efficient spin wave based adder and 4-2 compressor. In addition, it explains an efficient cascading concept that enable output spin wave amplitude normalizing by means of a directional coupler, which enable building larger and more complex circuits as the direct conversion free cascading of such gates is not possible due to input output data inconsistency. Furthermore, it presents the validation of our proposal and demonstrate its potential towards building spin wave circuits, and discussed the comparison with the state-of-the-art designs. - Chapter 7 introduces SW approximate computing concept, and presents the design and validation of the approximate full adder, 4:2 compressor, and multipliers. It, also, demonstrates the validation, performance evaluation and comparison with the state-of-the-art for all designs. - Chapter 8 illustrates the non Boolean based SW computing paradigm and the spin wave amplitude converter and its utilization in the framework of a SW non-binary adder. In addition, it describes the simulation platform, presents the simulation results, and compares the energy, delay, and estimated area of the proposed adder with SW and 16 nm CMOS counterparts. - Chapter 9 introduces a preliminary attempt to scrutinize SW technology chances to outperform state-of-the-art CMOS technology implementation in term of energy consumption. Given that not enough technological data are currently available, due to the early stage of the development, a full-fledged benchmarking is not possible nor relevant. However, we reverse the investigation direction in order to assess the requirements on technology, as seen from the circuit point of view, for energy effectiveness SW paradigm supremacy. - Chapter 10 concludes the thesis with some final remarks, and introduces the challenges ahead towards the design and realization of energy effective SW circuits and computation platforms and possible future directions. # 2 ## BACKGROUND AND STATE-OF-THE-ART - 1.1. PHYSICS OF SPIN WAVES - 1.2. FUNDAMENTALS OF SPIN WAVE COMPUTING - 1.3. GENERAL SPIN WAVE DEVICE STRUCTURE - 1.4. DIRECTIONAL COUPLERS - 1.5. STATE-OF-THE-ART - 1.6. CONCLUSIONS This chapter provides an overview over recent vigorous efforts to develop computing systems based on spin waves instead of charges and voltages. Spin-wave computing can be considered a subfield of spintronics, which uses magnetic excitations for computation and memory applications. This chapter main goal is to provide inside on SW related challenges and opportunities to facilitate synergistic interaction. While not going into deep details the chapter starts by covering SW creation, propagation, and device technology basics. Subsequently, we introduce the SW based computation paradigm and the general structure of SW device. Finally, we provide a SW devices state-of-the-art overview. This chapter content is based on the following publication: **A. Mahmoud**, F. Ciubotaru, F. Vanderveken, A. V. Chumak, S. Hamdioui, C. Adelmann, and S. Cotofana , *Introduction to spin wave computing*, Journal of Applied Physics **128**, 161101 (2020). Table 2.1: Material properties of representative ferromagnetic materials (saturation magnetization $M_s$ , Gilbert damping $\alpha$ , and exchange length $l_{ex}$ ), as well as propagation properties (group velocity $v_g$ , lifetime $\tau$ , and propagation distance Pd) of surface spin waves with a wavelength of $\lambda = 1$ µm in a 500 nm wide and 20 nm thick waveguide (external magnetic bias field $\mu_0 H = 100$ mT). | Material | M <sub>S</sub><br>(MA/m) | $\alpha$ (×10 <sup>-3</sup> ) | l <sub>ex</sub> (nm) | υ <sub>g</sub><br>(μm/ns) | τ<br>(ns) | Pd<br>(μm) | References | |------------------------------------------------|--------------------------|-------------------------------|----------------------|---------------------------|-----------|------------|-----------------------| | Fe | 1.7 | 60 | 3.4 | 5.8 | 0.08 | 0.5 | [122]–[126] | | Со | 1.4 | 5 | 4.8 | 4.6 | 1.2 | 5.5 | [127]–[131] | | Ni | 0.5 | 45 | 7.4 | 1.1 | 0.3 | 0.3 | [122],<br>[132]–[135] | | YIG $(Y_3Fe_5O_{12}, \mu \text{m films})$ | 0.14 | 0.05 | 17 | 42 | 600 | 25000 | [136]–[141] | | YIG $(Y_3Fe_5O_{12},$ nm films) | 0.14 | 0.2 | 17 | 0.3 | 150 | 44 | [142]–[148] | | Permalloy (Ni <sub>80</sub> Fe <sub>20</sub> ) | 0.8 | 7 | 6.3 | 2.2 | 1.4 | 3.2 | [149]–[152] | | CoFeB | 1.3 | 4 | 3.9 | 3.9 | 1.7 | 6.6 | [153]–[155] | | $Co_2$ $(Mn_xFe_{1-x})Si$ | 1.0 | 3 | 4.5 | 2.8 | 2.7 | 7.9 | [156]–[159] | #### 2.1. PHYSICS OF SPIN WAVES This section provides an introduction to spin waves and their characteristics. We first start by explaining the relevant basic magnetic interactions, followed by a discussion of the resulting magnetization dynamics. #### 2.1.1. MAGNETIZATION AND MAGNETIC INTERACTIONS Magnetic materials contain atoms with a net magnetic dipole moment $\mu$ . Therefore, they can be considered as a lattice of magnetic dipoles with specific amplitude and orientation at every lattice site. At dimensions much larger than the interatomic distances, it is more convenient to work with a continuous vector field than with discrete localized magnetic dipoles, *i.e.* with the so-called semiclassical approximation. The continuous vector field is called the magnetization and is defined as the magnetic dipole moment per unit volume [160] $$\mathbf{M} = \frac{\sum_{i} \boldsymbol{\mu}_{i}}{\delta V} \,. \tag{2.1}$$ At temperatures far below the Curie temperature, the magnetization norm is constant throughout the material and is called the saturation magnetization $M_s$ . On the other hand, the magnetization orientation can be position dependent and is determined by various magnetic interactions. In the following, the most important magnetic interactions are briefly explained. The Zeeman interaction describes the influence of an external magnetic field $\mathbf{H}_{\text{ext}}$ on the magnetization. The Zeeman energy density (energy per unit volume) is given by $$\mathcal{E}_{\mathbf{Z}} = -\mu_0 \mathbf{M} \cdot \mathbf{H}_{\mathbf{ext}}, \qquad (2.2)$$ with $\mu_0$ the vacuum permeability. Hence, the energy is minimal when the magnetization is parallel to the external magnetic field. Apart from external magnetic fields, the magnetization itself also generates a magnetic field, termed the dipolar magnetic field. For a given magnetization state, it is found by solving Maxwell's equations [71]. The dipolar magnetic field inside the magnetic material is called the demagnetization field, whereas the field outside is called the stray field. The energy density of the self-interaction of the magnetization with its own demagnetization field is given by $$\mathscr{E}_{\mathbf{d}} = -\frac{\mu_0}{2} \mathbf{M} \cdot \mathbf{H}_{\mathbf{d}}, \qquad (2.3)$$ with $\mathbf{H}_{\mathrm{d}}$ the demagnetization field. The demagnetization field itself strongly depends on the *shape* of the magnetic element [160], [161]. The demagnetization energy is minimal when the magnetization is oriented along the longest dimension of the magnetic object. This magnetization anisotropy is therefore often called shape anisotropy. The crystal structure of the magnetic material can also introduce an anisotropic behavior of the magnetization. This is called magnetocrystalline anisotropy and originates from the spin–orbit interaction, which couple the magnetic dipoles to the crystal orientation [162]. As a result, the magnetization may have preferred orientations with respect to the crystal structure. Magnetization directions that correspond to minimum energy are called easy axes, whereas magnetization orientations with maximum energy are called hard axes. Different types of magnetocrystalline anisotropy exist, depending on the crystal structure [162]. As an example, the energy density for uniaxial magnetocrystalline anisotropy can be expressed by $$\mathcal{E}_{\text{ani}} = -K_1 (\mathbf{u} \cdot \boldsymbol{\zeta})^2 - K_2 (\mathbf{u} \cdot \boldsymbol{\zeta})^4, \qquad (2.4)$$ with **u** the easy axis, $\zeta = \mathbf{M}/M_s$ the magnetization direction, and $K_1$ and $K_2$ the first and second order anisotropy constants, respectively. It is often convenient to describe magnetic interactions by corresponding effective magnetic fields. The general relation between a magnetic energy density and its corresponding effective field is given by $$\mathbf{H}_{\text{eff}} = -\frac{1}{\mu_0} \frac{d\mathcal{E}(\mathbf{M})}{d\mathbf{M}}.$$ (2.5) For the magnetocrystalline interaction, this becomes $$\mathbf{H}_{\text{ani}} = \frac{2K_1}{\mu_0 M_{\text{S}}} (\mathbf{u} \cdot \boldsymbol{\zeta}) \mathbf{u} + \frac{4K_4}{\mu_0 M_{\text{S}}} (\mathbf{u} \cdot \boldsymbol{\zeta})^3 \mathbf{u}. \tag{2.6}$$ In the case of polycrystalline materials, every grain may possess a different easy axis orientation. Therefore, the average magnetocrystalline anisotropy in macroscopic polycrystalline materials is zero and can be neglected, as it can be for amorphous materials. Another important magnetic interaction is the exchange interaction. It describes the coupling between neighboring magnetic dipoles and has a quantum-mechanical origin. In continuum theory, the exchange energy density is given by $$\mathscr{E}_{\text{ex}} = \frac{A_{\text{ex}}}{M_{\text{c}}^2} \left[ (\nabla M_{\text{x}})^2 + (\nabla M_{\text{y}})^2 + (\nabla M_{\text{z}})^2 \right], \tag{2.7}$$ with $A_{\rm ex}$ the exchange stiffness constant. In ferromagnetic materials, the exchange stiffness constant is positive, which means that the exchange energy is minimum when the magnetization is uniform. In antiferromagnetic materials, the exchange stiffness constant is negative, and the exchange energy is minimum when neighboring atomic dipoles are antiparallel. The corresponding exchange field is given by $$\mathbf{H}_{\text{ex}} = \frac{2A_{\text{ex}}}{\mu_0 M_{\text{s}}^2} \Delta \mathbf{M} = l_{\text{ex}}^2 \Delta \mathbf{M} \equiv \lambda_{\text{ex}} \Delta \mathbf{M}, \qquad (2.8)$$ with $\Delta$ the Laplace operator, $\lambda_{ex}$ is the exchange constant, and $l_{ex}$ the exchange length. This length is typically a few nm (Table 2.1) and characterizes the competition between the exchange and dipolar interaction. At length scales below $l_{ex}$ , the exchange interaction is dominant, and the magnetization is uniform. At larger length scales, the dipolar interaction dominates and domains with different magnetization orientations can be formed. In addition to the previously described interactions, various other interactions exist, such as the Dzyaloshinskii–Moriya interaction or the magnetoelastic interaction. Detailed discussions of the physics of these different interactions can be found in [160]–[162]. #### **2.1.2.** MAGNETIZATION DYNAMICS AND SPIN WAVES The dynamics of the magnetization in presence of one or several of effective magnetic fields are described by the Landau—Lifshitz—Gilbert (LLG) equation [163], [164] $$\frac{d\mathbf{M}}{dt} = -\gamma \mu_0 (\mathbf{M} \times \mathbf{H}_{\text{eff}}) + \frac{\alpha}{M_s} \left( \mathbf{M} \times \frac{d\mathbf{M}}{dt} \right), \tag{2.9}$$ where $\gamma$ the absolute value of the gyromagnetic ratio, $\mu_0$ is the vacuum permeability, $\alpha$ the Gilbert damping constant, and $\mathbf{H}_{\mathrm{eff}}$ the effective magnetic field. This effective field is the sum of all effective fields due to magnetic interactions and the external magnetic field. Hence, every magnetic interaction contributes to the magnetization dynamics via the cross product of the magnetization with its corresponding effective field. In equilibrium, the magnetization is parallel to the effective field. However, when the magnetization is not parallel to the effective field, it precesses around this field, as described by the first term in the LLG equation. The second term describes the attenuation of the precession and represents the energy loss of the magnetic excitations into the lattice (phonons) and the electronic system (electrons, eddy currents). All these effects are subsumed in the phenomenological Gilbert damping constant $\alpha$ . The combined effect of both terms in the LLG equation results in a spiral motion of the magnetization around the effective magnetic field towards the equilibrium state, as graphically depicted in Figure 2.1(a). Figure 2.1: Schematic of the magnetization dynamics described by the LLG equation. (a) The trajectory of the magnetization is determined by the combination of two torques [Eq. (2.9)]: (i) the precessional motion stems from $\mathbf{M} \times \mathbf{H}_{eff}$ , whereas (ii) the damping term $\mathbf{M} \times \frac{d\mathbf{M}}{dt} = \mathbf{M} \times (\mathbf{M} \times \mathbf{H}_{eff})$ drives the magnetization towards the direction of $\mathbf{H}_{eff}$ . (b) Schematic representation of a spin wave in a two-dimensional lattice of magnetic moments: top view of the first lattice row (top) and side view of the two-dimensional lattice (bottom). The LLG equation indicates that small oscillations of the effective magnetic field in time result in a precession of the magnetization. The precession can be either uniform or nonuniform over the magnetic volume. The case of uniform precession with a spatially constant phase is called ferromagnetic resonance. For nonuniform precession, the phase of the precession is position dependent and wave-like excitations of the magnetization exist, called spin waves [see Figure 2.1(b)]. Spin waves can thus be considered as stable wave-like solutions of the LLG equation. The ansatz for the magnetization dynamics of a spin wave in a bulk ferromagnet can be written as $$\mathbf{M}(\mathbf{r},t) = \mathbf{M}_0 + \mathbf{m} = \mathbf{M}_0 + \tilde{\mathbf{m}}e^{i(\omega t + \mathbf{k} \cdot \mathbf{r})}, \qquad (2.10)$$ with $M_0$ the static magnetization component, $\omega$ the angular frequency, and **k** the wavenumber. The effective magnetic field is then given by $$\mathbf{H}_{\text{eff}}(\mathbf{r},t) = \mathbf{H}_0 + \mathbf{h} = \mathbf{H}_0 + \tilde{\mathbf{h}} e^{i(\omega t + \mathbf{k} \cdot \mathbf{r})}, \qquad (2.11)$$ with $\mathbf{H}_0$ and $\mathbf{h}$ the static and dynamic components of the effective magnetic field, respectively. As discussed above, this effective magnetic field is the sum of the different effective fields due to the relevant magnetic interactions. For weak excitations, *i.e.* $||\mathbf{m}|| \ll ||\mathbf{M}_0|| \approx M_s$ , the LLG equation can be linearized by neglecting terms quadratic in $\mathbf{m}$ . After a temporal Fourier transform, we obtain $$i\omega \mathbf{m} = -\gamma \mu_0 (\mathbf{M}_0 \times \mathbf{h} + \mathbf{m} \times \mathbf{H}_0) + \frac{i\omega \alpha}{M_s} (\mathbf{M}_0 \times \mathbf{m}). \tag{2.12}$$ For specific values of ${\bf k}$ and $\omega$ , this linearized LLG equation has nontrivial solutions, which represent stable collective magnetization excitations of the form ${\bf \tilde m}e^{i(\omega(k)t+{\bf k}\cdot{\bf r})}$ , i.e. spin waves. The function $\omega=f({\bf k})$ that relates the spin-wave oscillation frequency to the wavevector is called the dispersion relation. The group velocity of a (spin) wave is defined by the gradient of the dispersion relation, ${\bf v}_{\bf g}=\nabla_{\bf k}\omega$ and represents the direction and the speed of the wave energy flow. In contrast, the phase speed, ${\bf v}_{\bf p}={\bf k}\omega/||{\bf k}||^2$ , describes the direction and speed of the wave phase front propagation. Waveguide structures are of crucial importance for spin-wave devices and circuits. Therefore, in the following, we briefly discuss the behavior of spin waves in waveguides with dimensions comparable or smaller to the wavelength. In such waveguides, the behavior and specifically the dispersion relation of spin waves are strongly affected by waveguide boundaries and lateral confinement effects. Considering a waveguide with a thickness d that is much smaller than its width w and with a rectangular cross section, the spin-wave dispersion relation is given by [165] $$\omega_{\rm n} = \sqrt{(\omega_0 + \omega_{\rm M} \lambda_{\rm ex} k_{\rm tot}^2)(\omega_0 + \omega_{\rm M} \lambda_{\rm ex} k_{\rm tot}^2 + \omega_M F)}, \qquad (2.13)$$ with $\omega_0 = \gamma \mu_0 H_0$ , $\omega_{\rm M} = \gamma \mu_0 M_0$ , and the abbreviations $$F = P + \sin^2 \phi \times \left(1 - P(1 + \cos^2(\theta_k - \theta_M)) + \frac{\omega_M P(1 - P)\sin^2(\theta_k - \theta_M)}{\omega_0 + \omega_M \lambda_{ex} k_{tot}^2}\right),$$ (2.14) and $$P = 1 - \frac{1 - e^{-dk_{\text{tot}}}}{dk_{\text{tot}}}.$$ (2.15) Here, $k_{\text{tot}}^2 = k^2 + k_{\text{n}}^2$ with $k_{\text{n}} = n\pi/w$ the quantized wavenumber, n is the mode number, k is the wavenumber in the propagation direction, $\theta_{\text{k}} = \arctan(k_{\text{n}}/k)$ , $\phi$ is the angle between the magnetization and the normal to the waveguide, and $\theta_{\text{M}}$ is the angle between the magnetization and the longitudinal waveguide axis. Note that this equation is only valid if the waveguide is sufficiently thin, *i.e.* $kd \ll 1$ , and the dynamic magnetization is uniform over the waveguide thickness. We also remark that, depending on the magnetization distribution and the demagnetization field at the waveguide edges, it may be necessary to use an effective width instead of the physical width to accurately describe the dispersion relations [166], [167]. For short wavelengths (for large k), the exchange interaction is dominant. In this limit, the dispersion relation shows a quadratic behavior $\omega_{n,ex} = \omega_M \lambda_{ex} k_{tot}^2$ , independent of the magnetization orientation. In contrast, for long wavelengths (for small k), the dipolar interaction becomes dominant. Then, the dispersion relation is given by Figure 2.2: Dispersion relation of backward volume spin waves (BVSW), surface spin waves (SSW), and forward volume spin waves (FVSW) in a 500 nm wide and 30 nm thick CoFeB waveguide. For BVSW and SSW, the dispersion relations of the first two laterally confined width modes ( $n_1$ and $n_2$ ) are shown. The material parameters are listed in Table 2.1 and the external magnetic field was $\mu_0 H = 100$ mT. The top panel depicts the mode profiles (top view) for confined width modes with mode numbers as indicated. $\omega_{ m n,dip} = \sqrt{\omega_0(\omega_0 + \omega_M F)}$ . The factor F strongly depends on the magnetization orientation, indicating that the dipolar interaction leads to anisotropic spin-wave properties. In the limit of infinite wavelengths, the frequency approaches the ferromagnetic resonance frequency, which can be considered as a spin wave with k=0. Figure 2.2 represents the spin-wave dispersion relations for different geometries in a 500 nm wide CoFeB waveguide (see Table 2.1 for material parameters) for an external magnetic field of $\mu_0H=100$ mT. In general, the dispersion relation of long-wavelength dipolar spin waves depends on the direction of the wavevector (the propagation direction) and the static magnetization, as described by Eq. (2.13). It is however instructive to discuss three limiting cases of dipolar spin waves that are often called surface spin waves, forward volume waves, and backward volume waves. The first case corresponds to the geometry, in which both the static magnetization and the propagation direction (the wavevector) lie in the plane of the waveguide and are perpendicular to each other, i.e. $\phi = \frac{\pi}{2}$ and $\theta_{\rm M} = \frac{\pi}{2}$ . Such spin waves are called surface spin waves (SSW) since they decay exponentially away from the surface [168]. Despite their name, the magnetization can still be considered uniform across the film for sufficiently thin films with $kd \ll 1$ . The dispersion relations of the first two SSW width modes ( $n_1$ and $n_2$ ) in a 500 nm wide CoFeB waveguide are depicted in Figure 2.2 for an external field of $\mu_0H=100$ mT. The curves indicate that the group and phase velocities are parallel and point in the same direction. In the second geometry, the static magnetization is both perpendicular to the propagation direction and the waveguide plane, *i.e.* $\theta_{\rm M} = \frac{\pi}{2}$ and $\phi = 0$ . The spin waves in this geometry have dynamic magnetization components in the plane of the waveguide and a group velocity parallel to the phase velocity. Such spin waves are called forward volume spin waves (FVSW) and their dispersion relation is also represented in Figure 2.2. In the third geometry, the static magnetization is parallel to the propagation direction, both lying in the plane along the waveguide, *i.e.* $\phi = \frac{\pi}{2}$ and $\theta_{\rm M} = 0$ . In this case, dipolar spin waves have a negative group velocity, which is antiparallel to the positive phase velocity, *i.e.* group and phase velocities point in opposite directions. Therefore, such waves are referred to as backward volume spin waves (BVSW). Their dispersion relation is also depicted in Figure 2.2 for the first two width modes ( $n_1$ and $n_2$ ). When the external driving magnetic fields are removed, the spin-wave amplitude decreases exponentially with a characteristic lifetime given by [71] $$\tau = \left(\alpha \omega_{\rm n} \frac{\partial \omega_{\rm n}}{\partial \omega_{\rm 0}}\right)^{-1} \,. \tag{2.16}$$ The spin-wave attenuation length represents the distance that a spin wave can travel until its amplitude has been reduced by 1/e. It is given by the product of the lifetime and the group velocity $\delta=\tau\times\nu_g$ . As shown in Table 2.1, spin-wave lifetimes are on the order of ns in metallic ferromagnets, such as CoFeB or Ni, whereas they can reach values close to the $\mu s$ range in low-damping insulators, such as $Y_3Fe_5O_{12}$ (yttrium iron garnet, YIG). Since spin-wave group velocities are typically a few $\mu m/ns$ (km/s), attenuation lengths are on the order of $\mu m$ for metallic ferromagnets to mm for YIG. The spin-wave group velocity, lifetime, and attenuation length (normalized to the wavelength) for the three cases of SSW, FVSW, and BVSW are plotted in Figure 2.3 as a function of the wavenumber for a CoFeB waveguide and an external magnetic field of $\mu_0H=100$ mT. Note that, when the static magnetization orientation is intermediate between the three limiting cases, the spin-wave properties also show intermediate characteristics. As a final remark, the BVSW and FWSV geometries both lead to volume waves, which means that increasing the waveguide thickness may lead to the formation of quantized spin-wave modes along the thickness of the film at higher frequencies. For SSW and BVSW, the group velocity reaches a maximum at small wavenumbers, which stems from the dipolar interaction. For BVSW, the group velocity becomes zero at a finite wavenumber (frequency) beyond the maximum, due to the competition between the dynamic dipolar and exchange fields. In the exchange regime, the group velocities of SSW and BVSW become equal and further increase with the wavenumber. For logic applications, it is desirable to use spin waves with large group velocities that ensure fast signal propagation and thus reduced logic gate delays. Moreover, large attenuation lengths reduce losses during spin wave propagation and are therefore also favorable for Figure 2.3: Propagation characteristics of backward volume spin waves (BVSW), surface spin waves (SSW), and forward volume spin waves (FVSW) in a 500 nm wide and 30 nm thick CoFeB waveguide, derived from the dispersion relations in Figure 2.2. (a) Group velocity, (b) lifetime, and (c) attenuation length of the spin waves normalized by the wavelength as a function of their wavevector. For BVSW and SSW, data are shown for the first laterally confined width mode $(n_1)$ . In all cases, the material parameters were those of CoFeB (see Table 2.1) and the external magnetic field was $\mu_0 H = 100$ mT. spin-wave devices. This will be further discussed below. Group velocities depend in general on the properties of the ferromagnetic medium, as shown in Table 2.1. The group velocity decreases typically strongly with decreasing film (or waveguide) thickness. This can be compensated by using magnetic materials with larger saturation magnetization $M_s$ . The spin-wave lifetime in Eq. (2.16) depends on the Gilbert damping $\alpha$ . As the attenuation length is given by the product of the group velocity and the lifetime, the largest values are obtained for low-damping magnetic materials with large $M_s$ . In practice, the two parameters $\alpha$ and $M_s$ may need to be traded off against each other, as indicated by Table 2.1. Additional material properties for ideal magnetic materials for logic computing applications are the possibility for cointegration along CMOS as well as a high Curie temperature to ensure temperature insensitivity. This renders the complexity of the materials selection process and currently no clearly preferred materials has emerged yet. Future material research in this field is thus of great interest to optimize conventional materials or to establish novel magnetic materials for spin-wave applications. #### 2.1.3. NONLINEAR SPIN-WAVE PHYSICS Section 2.1.2 has discussed spin-wave physics using the linearized LLG equation (2.12). Such an approach is valid for small amplitudes and describes noninteracting spin waves. However, the full LLG equation (2.9) is nonlinear and thus nonlinear effects can arise for large spin-wave amplitudes. Since nonlinear effects are central for several spin-wave device concepts, this section provides a brief overview over the topic. More details can be found in [71], [161], [169]–[171]. The theoretical model for nonlinear spin-wave interactions was originally developed by Suhl, and thus nonlinear spin-wave processes are often referred to as Suhl instabilities of the first and second order [161], [172], [173]. Later, a generalized quantum-mechanical description of nonlinear magnons (quantized spin waves), termed S-theory, was developed by Zakharov, L'vov, and Starobinets [174], [175]. Today, these models are primarily used to describe a variety of different nonlinear and parametric spin-wave phenomena [136], [176]–[180]. In general, the diverse nonlinear effects can be categorized into two groups: (i) multimagnon scattering [161], [175] and (ii) the reduction of saturation magnetization at large precession angles [176], [178]. However, (ii) can also be described by four-magnon scattering, so the separation into groups is not strict. Multimagnon scattering effects (i) primarily include three-magnon splitting (*i.e.* the decay of a single magnon into two), which can be used for the amplification of spin waves as a parametric process of the first order [161], [181], three-magnon confluence (*i.e.* the combination of two magnons forming a single one), and four-magnon scattering (*i.e.* the inelastic scattering of two magnons) that is fundamental for some spin-wave transistor concepts [61]. In all nonlinear scattering processes, the total energy and momentum are conserved. The magnon spectra in macroscopic structures always consist of a practically infinite number of modes with different wavevector directions. Hence, an initial pair of magnons, which participates *e.g.* in a four-magnon scattering process, can always find a pair of secondary magnons [161], [175]. However, in magnetic nanostructures [167], [182], the magnon density of states (scaling with the the structure size) also decreases, which makes the "search" for secondary magnon pairs more complex [183], [184]. Thus, the downscaling of magnonic nanostructures leads to a strong modification of nonlinear spin-wave physics, which offers the possibility to control (in the simplest case, switch on or off) nonlinear processes by the selection of the operating frequency and the external magnetic field. In contrast, processes (ii), which describe nonlinear frequency shifts of the spin-wave dispersion with increasing spin-wave amplitude, are typically more pronounced at the nanoscale [167]. These phenomena do not require any specific adjustment of the operating point and can thus be useful for spin-wave devices. In particular, the nonlinear shift of the spin-wave dispersion relation allows for the realization of nonlinear directional couplers [178]. Figure 2.4: Schematic of a Von Neumann computer consisting of a central processing unit and a memory, interconnected by a data bus. # 2.2. FUNDAMENTALS OF SPIN-WAVE COMPUTING In this section, we discuss the fundamental principles of different disruptive computation paradigms based on spin waves to establish a framework for the architecture of a spin-wave-based computer. We start by introducing the basic components of a computing system, their implementations using spin waves, and the limitations of an all-spin-wave system. # 2.2.1. BASIC COMPUTER ARCHITECTURES Despite many advances in computer architecture, the majority of today's computing systems can still be considered to be conceptually related to the Von Neumann architecture that was developed originally in the 1940s [185]. Such a system consists of three essential parts: (i) a central processing unit that processes the instructions of the computer program and controls the data flow, (ii) a memory to store data and instructions, and (iii) a data bus as interconnection that links the the various parts within the processor and the memory and provides communication with the outside world. A schematic of such a system is shown in Figure 2.4. Hence, to design a computer system that operates entirely with spin waves, spin-wave processors, spin-wave memory, as well as spin-wave interconnects need to be developed. Moreover, interfaces between the spin-wave processor and the outside periphery—presumably charge-based—are required, including a power supply. The performance of a computing system is generally limited by the weakest component. Its computing throughput is restricted by the slowest part and the power consumption is determined by the most power-hungry subsystem. As detailed below, there Figure 2.5: Different schemes to encode information in (spin) waves: (a) binary amplitude encoding, (b) binary phase encoding, and (c) quaternary (nonbinary) mixed amplitude and phase encoding. is currently no comprehensive concept for a full spin-wave computer. In the following, we discuss requirements, basic approaches, and potential spin-wave-based implementations of the main components of a computer and finally suggest how a spin-wave-based computing system may resemble. Recently, there has been growing interest in alternative computing paradigms beyond Von Neumann architectures, especially in the field of machine learning [29]–[31]. Whereas the implementation of such architectures by spin waves is an intriguing prospect, research on this topic is still in its infancy [99], [186]–[191]. # 2.2.2. Information Encoding Before discussing spin-wave computing concepts, we need to define how information can be encoded in a spin wave. Waves are characterized by amplitude (intensity), phase, wavelength, and frequency, which can all be used for information encoding. It is clear that the encoding scheme determines the interactions that can be employed for information processing and computation. Presently, device proposals typically rely on information encoded in spin-wave amplitude and/or phase (see Figure 2.5). Moreover, the usage of different frequency channels has been proposed to enable parallel data processing based on frequency-division multiplexing [80], [104]. In amplitude-based information encoding, two main schemes can be pursued: (i) amplitude level encoding, and (ii) amplitude threshold encoding. In amplitude level en- coding, the presence of a spin wave in a waveguide is referred to as a logic 1 and no spin wave as a logic 0 [Figure 2.5(a)]. In contrast, in amplitude threshold encoding, a logic 1 is represented by a spin wave with an amplitude above a certain threshold and a logic 0 otherwise (or *vice versa*). Multiple thresholds can be defined to represent nonbinary information and enable multivalued logic and computing. For example, if $\{X,Y\}$ with X < Y are defined as a set of thresholds, a spin-wave amplitude greater than Y can represent a 1, an amplitude between X and Y a 0, and an amplitude below X a -1. Alternatively, information can be encoded in the (relative) spin-wave phase, such that e.g. a relative phase of 0 (i.e. a spin wave in phase with a reference) refers to a logic 1, while a relative phase of $\pi$ refers to a logic 0 [Figure 2.5(b)]. Furthermore, additional phases can be utilized for multivalued logic, e.g. {1,0,-1} can be represented by the set of phases {0, $\frac{\pi}{2}$ , $\pi$ }. Such ternary computing schemes can have advantages over binary ones and the implementation of ternary logic circuits using (spin) waves may be an interesting future research topic, e.g. for computer arithmetics or neural networks. Combinations of amplitude and phase encoding schemes are also possible and open further pathways towards effective nonbinary data processing [Figure 2.5(c)]. For example, the data set $\{0,1,2,3\}$ can be encoded using two amplitude levels $\{A,2A\}$ and two phases $\{0,\pi\}$ by $0:=\{A,0\}$ , $1:=\{A,\pi\}$ , $2:=\{2A,0\}$ , and $3:=\{2A,\pi\}$ . Such schemes can be easily generalized to larger sets of nonbinary information. The different encoding schemes have specific advantages and drawbacks when implemented in spin waves. Spin waves have typical propagation distances of $\mu m$ to mm, depending on the host material. For amplitude coding, the maximum size of a spin-wave circuit needs to be much smaller than the spin-wave attenuation length, since the logic level may otherwise change during propagation. In contrast, the phase of a wave is not affected by attenuation. While computing schemes may still require well-defined amplitudes, as further outlined below, the logic value encoded in the spin wave is nonetheless stable during propagation. Moreover, the phase coherence times of spin waves are long and phase noise can be kept under control even for nanofabricated waveguides with e.g. considerable line width roughness [192], rendering phase encoding rather stable. However, the largest differences between encoding schemes lie in the different interactions and processes required for computation. We finally note that spin waves are noninteracting in the small signal approximation, i.e. for small amplitudes. Therefore, parallel data processing is possible using e.g. frequency division or wavelength-division multiplexing. An information encoding scheme can then be defined at each frequency or wavelength and computation can occur in parallel in the same processor. #### **2.2.3.** How to compute with (spin) waves? When logic levels are encoded in spin-wave amplitude or phase, performing a logic operation requires the combination of different input waves and the generation of an output wave with an amplitude or phase corresponding to the desired logic output state. In principle, the superposition of waves can lead to the addition of either their intensity or their amplitude, depending whether the waves are incoherent or coherent [193]. Since practical spin-wave signals typically have a large degree of phase coherence, further discussion can be limited to coherent superposition. In absence of nonlinear effects, the Figure 2.6: Out-of-plane component of magnetization ( $M_z$ ) in a 50 nm wide and 5 nm thick CoFeB waveguide obtained by micromagnetic simulations: Snapshots images of the spin waves emitted by a single port (a), and two in-phase (b) or anti-phase (c) ports at a frequency of 15 GHz. The corresponding amplitudes along the magnetic waveguide are shown in panels (d) and (e), respectively. The material parameters considered in simulations were taken from table 2.1. The magnetic waveguide was initially magnetized longitudinally, whereas the simulations of spin-wave propagation were carried out in zero magnetic bias field. Spin waves were excited by a uniform out-of-plane magnetic field at positions $P_1$ and $P_2$ in the waveguide center. interaction of coherent waves is described by interference, *i.e.* the addition of their respective amplitudes at each point in space and time. We also limit the discussion to the superposition of waves with identical frequency and wavelength. Whether the interference of waves with different frequency or wavelength can also be (efficiently) utilized to evaluate logic functions is still an open research question with the potential for additional avenues towards novel computation paradigms. For in-phase waves with equal frequency, constructive interference leads to a peak-to-peak amplitude of the generated wave that is equal to the sum of the peak-to-peak amplitudes of the input waves. In contrast, destructive interference leads to a subtraction of the peak-to-peak amplitudes of input waves when their phase difference is $\pi$ . For spin waves, the corresponding magnetization dynamics are depicted in Figure 2.6. In narrow waveguides, the spin-wave modes [see Figure 2.6(a) for the mode pattern of the first width mode] may deviate from plane waves due to lateral confinement and the effect of the demagnetizing field. Nonetheless, micromagnetic simulations, which rely on solving the LLG equation numerically [194], [195], for a CoFeB waveguide [Figs. 2.6(b) and 2.6(c)] indicate that confined spin waves still show the expected interference. By placing two spin-wave sources on the same waveguide, destructive [Figure 2.6(b)] or constructive [Figure 2.6(c)] interference is obtained for a relative phase of $\pi$ or 0, respectively. The observation of incomplete destructive interference in Figure 2.6(b) can be linked to spin-wave attenuation, which leads to slightly different amplitudes of the two waves at both sides of the spin-wave sources. Wave interference can be exploited to compute basic Boolean operations using the different encoding schemes. For example, using amplitude level encoding, it is easy to see that the constructive interference of two waves generates output of an OR operation, whereas their destructive interference (with a phase shift of $\pi$ between the waves) produces the output of an XOR operation. Many proposals and experimental studies have focused on phase encoding and the calculation of the majority function, MAJ [59], [64], [77], [89]–[91], [196]–[198]. This stems from the fact that the phase of the output wave, ensuing from the interference of three input waves, is simply the majority of the phases of the input waves when logic 1 is encoded in phase 0 and logic 0 in phase $\pi$ (or *vice versa*). Together with recent advances in MAJ-based circuit design [67], [68], [199], [200], this has led to a strong interest in spintronics [42], [50], [52], [55], [64] and in particular spin-wave majority gates [59], [64], [201], [202]. As an example, the carry out bit in a full adder (a fundamental building block in processor design) is directly computed by a three-input majority function [*cf.* Eq. (2.17)]. In addition, many error detection and correction schemes rely on *n*-input majority logic [203], [204]. For novel computation paradigms, including (spin) wave computing, a main requirement is the possibility to implement any arbitrary logic function that can be defined within its basic formalism by means of a universal gate set. For example, within Boolean algebra, any logic function can be expressed as a sum of products or as a product of sums. Using double complements and De Morgan's laws, it can be demonstrated that any logic function can be implemented by either NAND or NOR gates only. Therefore, NAND or NOR constitute each a universal gate with efficient CMOS implementations. As mentioned above, (spin) wave interference provides a natural support to implement majority gates, MAJ, which form a universal gate set in combination with inverters, INV. In phase encoding, an inverter can be realized by a passive delay line of length $(n-\frac{1}{2}) \times \lambda$ (with $\lambda$ the spin-wave wavelength and $n = 1, 2, 3, \dots$ an integer) that leads to a phase shift of $\pi$ during propagation. In amplitude encoding, inverters are more complex and typically require active components. In this case, an inverter can be realized by interference with a reference wave with a phase of $\pi$ . As an example, XOR, XNOR, and a full adder (sum $\Sigma$ and carry out $C_{out}$ ) can then be implemented with majority gates and inverters as follows: $$A \oplus B = \text{MAJ} \left( \text{MAJ} \left( A, \bar{B}, 0 \right), \text{MAJ} \left( \bar{A}, B, 0 \right), 1 \right)$$ $$\overline{A \oplus B} = \text{MAJ} \left( \text{MAJ} \left( \bar{A}, \bar{B}, 0 \right), \text{MAJ} \left( A, B, 0 \right), 1 \right)$$ $$\Sigma = \text{MAJ} \left( \overline{\text{MAJ}} \left( A, B, C_{in} \right), \text{MAJ} \left( A, B, \bar{C}_{in} \right), C_{in} \right)$$ $$C_{\text{out}} = \text{MAJ} \left( A, B, C_{in} \right)$$ $$(2.17)$$ It should be mentioned that wave-based computing is not limited to the usage of spin waves. Similar concepts have been proposed for surface plasmon polaritons [205]–[208], or acoustic waves/phonons [209], [210]. Figure 2.7: Schematic of a clocked spin-wave interconnect. Reproduced with permission from S. Dutta, S.-C. Chang, N. Kani, D. E. Nikonov, S. Manipatruni, I. A. Young, and A. Naeemi, Sci. Rep. 5, 9861 (2015). Copyright 2015 Nature. #### **2.2.4.** Spin-wave interconnects In Section 2.2.3, the basic principles of spin-wave interference have been discussed and it has been shown that they can be used for logic operations. However, in a computing system, data need to be transmitted to the inputs of the logic circuit, exchanged between gates, and finally output data need to be transmitted to *e.g.* a memory. This is the task of the interconnect, which may also transmit clock signals as well as power. In conventional digital integrated circuits, the logic states 0 and 1 are encoded in voltages, which allows for data transmission by metal wires. While interconnect performance is today often limiting the overall performance of integrated circuits, solutions are mature and well understood from the point of view of their capabilities and associated overhead. A natural approach to connect spin-wave logic gates is by means of waveguides, in which spin waves propagate from e.g. a gate output to an input of a subsequent gate. Besides cascading issues for specific implementations, the rather slow and lossy spin-wave propagation leads to fundamental limitations for spin-wave interconnects [85], [211], [212]. Since the spin-wave group velocity is much lower than that of electromagnetic waves in (nonmagnetic) metallic wires, interconnection by spin waves propagating in waveguides adds a considerable delay overhead, which depends on waveguide length and material. Some representative numbers for the spin-wave group velocity are listed in Table 2.1. Typical delays are about 1 ns/ $\mu$ m ( $\mu$ s/mm), which means that spin waves propagating in waveguides cannot be efficiently utilized for long-range data transmission. Even for short range data communication, the delay introduced by spin-wave propagation may not be negligible. As an example, for a spin-wave circuit with a waveguide length of a few $\mu$ m, the propagation delay may already exceed the duration of a typical clock cycle of a high performance CMOS logic processor of about 300 ps ( $\sim$ 3 GHz clock frequency). It is worth noting that the overall delay is determined by the longest propagation path in the circuit. Hence propagation delays may limit the computing throughput of a spin-wave circuit. Moreover, the spin-wave amplitude decays during propagation due to intrinsic magnetic damping. Such propagation losses remain limited when spin-wave circuits are much smaller than the attenuation length, which strongly depends on the waveguide material (see Table 2.1 for indicative numbers). This can impose severe limits on the size (and therefore the complexity) of spin-wave circuits. Losses can in principle also be compensated for by spin-wave amplifiers or repeaters. As an example, a clocked interconnect concept based on spin-wave repeaters has been reported in [213] (see Figure 2.7). While such approaches can mitigate limitations of signal propagation by spin waves, they add a significant overhead to the circuit and need to be carefully considered when the energy consumption and delay of a spin-wave computing system is assessed. #### 2.2.5. SPIN-WAVE MEMORY To date, rather little work has been devoted to specific spin-wave memory elements that are required for computing systems based on spin waves only. Spin waves are volatile dynamic excitations, which decay at timescales of ns to $\mu$ s (see Table 2.1). There are two different basic approaches to memories for spin waves. The natural spintronic memory element is a nanomagnet, in which the information is encoded in the direction of its magnetization. In such a memory element, an incoming spin wave deterministically sets (switches) the orientation of the magnetization of the nanomagnet. When phase encoding is used, the interaction between the spin wave and the nanomagnet needs to be phase dependent. The clocked interconnect concept [213] depicted in Figure 2.7 employs the deterministic phase-sensitive switching of nanomagnets with perpendicular magnetic anisotropy in the repeater stages. It therefore also offers some memory functionality. A 2D-mesh configuration of such structures has also been proposed [97], [214]. An alternative approach is the use of conventional charge-based memories after signal conversion in the hybrid spin-wave–CMOS systems discussed in Section 2.2.6. An introduction to charge-based memory devices can be found *e.g.* [2], [215], [216]. # **2.2.6.** Hybrid spin-wave-CMOS computing systems Above, we have argued that spin-wave propagation in magnetic waveguides may add considerable delay and is therefore not competitive over distances of more than a few $100\,\mathrm{nm}$ to $1\,\mathrm{\mu m}$ . To address this issue, metallic or optic interconnects can be used for long range data transmission after spin-wave signals have been converted to electric or optical signals. Voltages and light travel very fast through metal wires and optical fibers, respectively, with propagation velocities given by the speed of light in the host materials. Such solutions lead naturally to hybrid system concepts, in which spin-wave circuits coexist with conventional CMOS or mixed-signal integrated circuits, including memory. Such solutions rely on (frequent) forth-and-back conversion between spin-wave and charge domains using transducers, which may themselves add substantial delay and energy consumption overhead. To minimize the overhead, the number of necessary transducers should remain limited. The acceptable conversion granularity depends on the relation between delay and energy consumption of spin-wave circuits, transducers, and CMOS/mixed-signal circuits. In practice, it is of course technology dependent. Today, design guidelines for such hybrid circuits are only emerging. Their development and the benchmarking of the ensuing hybrid circuits constitute a crucial step towards real-world applications for spin-wave computing. Since hybrid systems require efficient and scalable transducers, the approaches to generate and detect coherent spin waves are discussed in the next section. Such transducers form critical elements of the spin-wave devices and circuits that are reviewed in the next section. # 2.3. GENERAL SPIN WAVE DEVICE STRUCTURE Conceptually speaking a SW device includes 4 stages related to SW creation, propagation, processing, and detection. Spins are excited and a SW is generated at the first stage after which it propagates through the waveguide. When traveling through the waveguide the SW can be manipulated or exposed to different factors within the so-called Functional Region and finally a detector is required to produce the output value [75], [76]. A generic SW device is presented in Figure 2.8. This section discusses relevant aspects related to each SW gate functional stage. #### 2.3.1. EXCITATION CELL The first and key stage, marked with A in Figure 2.8 targets the SW creation. This can be realized by making use of excitation methods relying on, e.g., Microwave Antenna, Magneto Optical Kerr Effect, Spin Transfer Torque, Magnetoelectric Cell [76], and the main requirements for SW excitation are expressed in wavenumber range, efficiency, and coherence. Different excitation methods and their features are discussed thoroughly in [74]. The conventional SW excitation approach is to generate an alternating Oersted magnetic field by sending a current through a microstrip antenna, resulting in a torque that increases spin precession around the bias magnetic field. As a result, through exchange interaction, the precessional spins under the antenna interact with neighbouring spins, which results in the creation of a spin wave [76]. Different antenna designs, i.e., microstrip-line antenna, coplanar waveguide antenna, and loop antenna are presented in [217]. Antenna dimensions effect on SW amplitude and k-distribution of high frequency magnetic field are investigated and it is suggested that: (i) the maximum H-field amplitude decreases when the microstrip line and inner conductor width increases and (ii) the maximum amplitude increases with the loop antenna width extension. Moreover, the H-field amplitude is less uniform when the microstrip line and inner conductor become wider while the amplitude is more uniform when loop antenna width increases. In addition, in all cases a wider SW bandwidth (k-distribution) is obtained as the antenna size decreases [217]. Figure 2.8: General Spin Wave Structure. Figure 2.9: ME Cell Structure. As previously mentioned many different methods can be utilized for excitation, but most of them need large and complex equipments and cannot be integrated in magnonic devices of practical interest. Spin Transfer Torque (STT) that relies on the injection of a spin polarized current into a magnetic metallic film to generates a spin transfer torque, which at its turn excites a spin wave, appears to a promising excitation method [218], [219]. The spin polarized current can be generated with a Spin-Torque Nano-Oscillator [220] or based on the Spin Hall Effect [221]. A more power efficient way is voltage controlled SW excitation as it can be performed by an electric field resulting in a less power hungry device [222]. Recently, an efficient Magneto Electric (ME) cell has been introduced [59], which exhibits high functional versatility and can be utilized as SW exciter, detector, amplifier, memory, and modulator. In ME based devices SW excitation, detection, and control are performed via external voltages and only the information processing is performed in the SW domain. A possible ME cell introduced in [59] is graphically depicted in Figure 2.9[59]. Roughly speaking the ME cell operates as follows. When an input voltage is applied an electric field is generated between the metallic and magnetoresistive layers. As a consequence, a magnetoresistive material magnetization change occurs due to the stress generated by the electric field into the pizoelectric layer. As a result of the interaction between the spins in the magnetoresistive material and the waveguide a SW is created and propagates through the waveguide. An alternative SW excitation approach that makes use of an AC voltage controlled multiferroic ME cell is presented in [223]. ME cells have higher scalability potential than microstrip antenna and STT based approaches, as they do not need a delocalized magnetic field; however, they are still at the concept level as, up to date, no efficient ME cell implementation has been reported. A promising mechanism for transducer (excitation and detection stage) realizations is ferroelectric switching, which implies Voltage Controlled Magnetic Anisotropy (VCMA) [224]. VCMA is the variation of the perpendicular magnetic anistropy between dielectric and ferromagnetic layers when an electric field is applied. VCMA linearly depends on the applied electric field and it was demonstrated that large group velocity exchange SWs can be excited by it. This is a quite attractive approach for the development of future SW data processing device due to its low SW generation latency [224]. #### 2.3.2. WAVEGUIDE The most fundamental element for information processing and transfer by spin waves is a waveguide: the spin-wave conduit. In the conduit, information encoded in the spin-wave amplitude or phase propagates at the spin-wave group velocity, which depends on material, frequency, and the effective static magnetic bias field in the waveguide. When the spin wave wavelength is comparable to the conduit length, the phase of the spin wave oscillates along the conduit. An ideal conduit material combines low Gilbert damping and high Curie temperature. Large saturation magnetization $M_s$ maximizes the spin wave power transmission and increases the output signal by inductive antennas but also reduces the magnetoelastic coupling. Typical materials include YIG with very low Gilbert damping in single-crystal form or more CMOS-compatible polycrystalline or amorphous metallic ferromagnets such as CoFeB or permalloy (Ni<sub>80</sub>Fe<sub>20</sub>), with Heusler alloys such as $Co_2(Mn_xFe_{1-x})Si$ emerging [159], [225]–[227]. Basic magnetic properties of these materials are listed in Table 2.1. Spin-wave conduits show excellent scalability at the nanoscale and propagation of backward volume spin waves in YIG waveguides as narrow as 50 nm has been demonstrated, albeit with reduced attenuation length [182]. # 2.3.3. FUNCTIONAL REGION After the spin wave starts its propagation through the waveguide, it eventually reaches the Functional Region (FR), where, if needed, it suffers certain transformations, after which it propagates to the Detection Stage [75]. The FR can include elements as modulator [228], [229], amplifier [180], [197], [230]–[237], and repeater [213], which allows for partial or full SW transmission, SW amplitude amplification, and SW phase regeneration, respectively. This is also where the actual gate associated calculation is performed by means of spin wave interferences. In addition, SW can be normalized in this region using what is called directional couplers. In addition, the spin-wave propagation can be further manipulated by engineering locally the magnetic properties or the shape of the waveguide. Periodic manipulations lead to magnonic crystals. Magnonic cystals are magnetic media whose magnetic properties change periodically in one, [238], [239] two, [240], [241] or three dimensions [242]–[244]. More details on magnonic crystals can be found *e.g.* in [245]. We note that amplitude normalization is required in order to produce the correct output and enable gate cascading and can be done by means of a directional coupler as described the next section. #### 2.3.4. DETECTION CELL When the post interference SW is reaching the functional section end (if any) it enters the detector region where different approaches, e.g., conventional microstrip antennas, Brillouin Light Scattering (BLS) spectroscopy, Spin-Polarized Electron Energy Loss Spectroscopy (SPEELSC), ME cell, can be utilized to detect the spin wave and extract the output result [76]. Detector's key features are high sensitivity, wide frequency range detection capability, and high spatial and frequency resolution. The conventional microstrip antenna based SW detection approach relies on the inverse principle utilized for SW excitation or on Brillouin Light Scattering (BLS) spectroscopy. In BLS, the magnon induced inelastic photons scattering principle is used to determine the SW wavenumber and frequency [76], [246]. Note that while many detection methods are in place, most of them require large and complex equipments thus cannot be integrated in magnonic devices. The conventional SW detection approach makes use of a strip antenna within which the SW generates an AC current subsequently rectified with a diode. However, if a DC current is the preferred output option the detection process becomes complex. A promising SW detection techniques is to combine inverse spin Hall effect and spin pumping. Conceptually speaking this allows for current detection in an attached nonomagnet wire as a result of the SW induced spin polarized current. In this way the spin current is converted to a DC voltage [247]. A more power efficient way is the voltage controlled spin wave detection introduced in [222] where detection is performed by means of an electric field. As previously mentioned ME cells can be also utilized as SW detectors. The operation in this case follows the inverse trajectory, i.e., the propagated SW makes a change in the magnetoresistive layer magnetic field, which induces a stress on the pizoelectric layer that at its turn generates an electric field, which is finally detected as a voltage level at ME's $V_{out}$ electrode as indicated in Figure 2.9b) [59]. # 2.4. DIRECTIONAL COUPLERS Two waveguides placed in close proximity constitute a dipolar coupler as dipolar fields extend outside the waveguides, and thus magnetically couple them. This coupling induces energy transfer from one waveguide to the other depending on several parameter values, as further discussed in the sequel. A schematic picture of such a dipolar coupler is presented in Figure 2.10a), where a SW is induced in the top waveguide and, due to coupling, part of its energy reaches *O*1 while the rest is routed to *O*2. Equations (2.18) - (2.29) describe the dispersion relations and energy transfer within the directional coupler [248]–[251]. When the two waveguides are placed close enough to each other, the dipolar coupling splits the SW dispersion relation into a symmetric (has a symmetric profile over both waveguides) and an anti-symmetric (has an asymmetric profile over both waveguides) mode. The SW dispersion relation for the isolated top waveguide (without coupling), in addition to the symmetric and asymmetric modes can be calculated by using Equations (2.18) and (2.19), and they are graphically presented in Figure 2.10b) [250]–[252]. $$f_o(k_x) = \frac{1}{2\pi} \sqrt{\Omega^{yy} \Omega^{zz}},\tag{2.18}$$ $$f_{s,as}(k_x) = \frac{1}{2\pi} \sqrt{(\Omega^{yy} \pm \omega_M F_{kx}^{yy}(d))(\Omega^{zz} \pm \omega_M F_{kx}^{yy}(d))},$$ (2.19) where $f_o(k_x)$ is the isolated spin wave waveguide dispersion relation, $f_{s,as}(k_x)$ the symmetric and asymmetric dispersion relations for spin waves in coupled waveguides, $\Omega^{ii}=\omega_H+\omega_M(\lambda_{ex}^2k_x^2+F_{kx}^{ii}(0))$ , $i=y,z,\omega_H=\gamma B_{ext}$ , $\omega_M=\gamma \mu_o M_s$ , $M_s$ the magnetic saturation, $\lambda_{ex}=2A_{ex}/\mu_o M_s^2$ , $A_{ex}$ the exchange constant, $d=w+\delta$ the distance between the two waveguides centers, w the waveguides width, and $\delta$ the gap between the two waveguides, and $F_{kx}$ is the tensor that describes the dynamical magneto-dipolar interaction Figure 2.10: a) Directional Coupler. b) Dispersion Relation (DR) of Single (I), Symmetric (S) and Asymmetric (As) Spin Wave Waveguide (WG) Modes in the Linear Region. c) Power Transmission Ratio between Coupled Waveguides with $L_w$ =3 $\mu$ m. d) Dispersion Relation of Single, Symmetric and Asymmetric Spin Wave Waveguide Modes at the Non-linear Region (with Frequency Shift Effect). calculated according to Equations (2.20) and (2.21) [248]-[251]. $$F_{kx}^{yy}(d) = \frac{1}{2\pi} \int \left(\frac{|\sigma|^2 k_y^2}{\tilde{w}k^2} (1 - \frac{1 - e^{-kh}}{kh})\right) e^{ik_y d} dky,\tag{2.20}$$ $$F_{kx}^{zz}(d) = \frac{1}{2\pi} \int \frac{|\sigma|^2}{\tilde{w}} \frac{1 - e^{-kh}}{kh} e^{ik_y d} dk_y, \tag{2.21}$$ where $\sigma$ is the Fourier transform of the spin wave profile across the waveguide width, $\tilde{w}$ the normalized mode profile constant, $k = \sqrt{k_x^2 + k_y^2}$ , and h the waveguide thickness. Note that $\tilde{w}$ equals w and $\sigma = w sinc(k_y w/2)$ , if the electron spins are fully unpinned at the waveguide edges. Two spin wave modes, i.e., symmetric with wavenumber $k_s$ and antisymmetric with wavenumber $k_{as}$ , are simultaneously excited only if the excited spin wave frequency is higher than the asymmetric spin wave minimum frequency. Thus, the overall spin wave energy resonantly transfers from one waveguide to the other after the spin wave propagation along the coupling length $L_c$ as presented in Figure 2.10a) [250]–[254]. The $L_c$ value depends on different parameters such as wavelength, applied magnetic field, space between waveguides, waveguides sizes, spin wave amplitude in addition to its magnetization, and can be calculated as in Equation (2.22) [250], [251]. $$L_c = \frac{\pi}{|k_s - k_{as}|},\tag{2.22}$$ The amount of energy transferred between the waveguides can be tuned by means of the coupling length $L_c$ and the length of the coupled waveguide $L_w$ , which jointly determine the strength of the coupling effect between the two waveguides. Equation (2.23) presents the relation between these two parameters and the energy transfer ratio [250] $$\frac{O_1}{O_1 + O_2} = \cos^2\left(\frac{\pi L_w}{2L_c}\right),\tag{2.23}$$ where $O_1$ is the output energy of the first waveguide, $O_2$ the output energy of the second waveguide, $L_w$ the length of the coupled waveguides and $L_c$ the coupling length [250]. Figure 2.10c) presents the energy split according to Equation (2.23) for the particular case of $L_w = 3 \, \mu \text{m}$ and one can observe in the Figure that the $L_c$ value modulates the energy transfer between the two waveguides. The above equations hold true, if the spin wave amplitude value is low. However, non-linearity effects start increasing as the amplitude increases, which causes non-linear frequency shifts of the spin wave symmetric and asymmetric dispersion relations as expressed in Equation (2.24). $$f_{s,as}^{(nl)} = f_{s,as}^{(0)}(k_x) + T_{kx}|a_{kx}|^2$$ (2.24) where $a_{kx}$ is the spin wave amplitude, $T_{kx}$ the spin wave nonlinear frequency shift, which can be calculated using Equation (2.25)[250], [251], [255], [256]. $$T_{kx} = \frac{w_H - A_{kx} + \frac{B_{kx}^2}{2\omega_o^2} (\omega_M (4\lambda^2 k_x^2 + F_{2kx}^{xx}(0)) + 3\omega_H)}{2\pi},$$ (2.25) where $$A_{kx} = \omega_H + \frac{\omega_H}{2} (2\lambda_{ex}^2 k_x^2 + F_{kx}^{yy}(0) + F_{kx}^{zz}(0)), \qquad (2.26)$$ $$B_{kx} = \frac{\omega_M}{2} (F_{kx}^{yy}(0) - F_{kx}^{zz}(0)), \qquad (2.27)$$ and $$F_{2kx}^{xx}(d) = \frac{1}{2\pi} \int \frac{|\sigma|^2 4k_x^2}{\tilde{w}k^2} (1 - \frac{1 - e^{-kh}}{kh}) e^{ik_y d} dk_y, \tag{2.28}$$ where $k = \sqrt{4k_x^2 + k_y^2}$ . Figure 2.10d) captures this effect for two different spin wave amplitudes [248], [250]. As depicted in the Figure, when the spin wave amplitude increases from 0.080 to 0.160, the dispersion relation shifts downward. Additionally, the energy splitting ratio is affected by the non-linear frequency shift as indicated by Equation (2.29) [251]. $$\frac{O_1}{O_1 + O_2} = \cos^2\left(\frac{\pi L_w}{2L_c} - \frac{\pi L_w}{2L_c^2} \frac{\partial L_c}{\partial f} T_{kx} |a_{kx}|^2\right)$$ (2.29) Equation 2.29 demonstrates that as the ratio between $L_c$ and $L_w$ increases, the non-linearity effect increases, which makes the directional coupler very sensitive to SW amplitude variations. In the proposed non-binary to binary converter, two types of directional coupler are needed: one works in the linear regime such that the energy transfer is not affected by the SW amplitude level, and one works in the non-linear regime such that the energy transfer is affected by the SW amplitude level. Therefore, for the first type, the ratio between $L_c$ and $L_w$ must be small and the distance between the coupled waveguides must be large to decrease the coupling effect. In contrast, the ratio between $L_c$ and $L_w$ , must be large and the distance between the coupled waveguide must be small to increase the coupling effect for the second type. For example, if the coupler is designed with a coupling length of 370 nm, distance between waveguides (DW) of 50 nm, Yttrium Iron Garnet (YIG) thickness of 30 nm and width of 100 nm, SW wavelength of 340 nm and frequency of 2.282 GHz, the spin wave energy equally splits between the waveguides regardless of its amplitude [251]. Whereas, if it was designed with a coupled waveguide length of 3 $\mu$ m, distance between the waveguides of 10 nm, while using the same other parameters as the previous example, the SW energy splits differently between the waveguides depending on the input spin wave amplitude, i.e., if if SW amplitude is 2A, nothing moves to the second waveguide, whereas if SW amplitude is 3A, 50 % of it moves to the second waveguide, and if it is 4A, the SW moves completely to the second waveguide [251]. Note that these split ratios change as the parameters change, and that the mentioned parameter values were utilized to calculate the dispersion relations in Figure 2.10. # 2.5. STATE-OF-THE-ART After introducing basic concepts of spin-wave computing and the transducers at the input and output ports of spin-wave devices, we now discuss practical implementations of logic elements and gates that can be used to design spin-wave logic circuits. While nonlinear devices such as spin-wave transistors and directional couplers are also reviewed, the section focuses on passive linear logic gates based on spin-wave interference. Linear passive gates take the most advantage of the wave computing paradigm and bear the highest promise for ultralow-power electronics. #### 2.5.1. SPIN-WAVE TRANSISTORS The basic building block of CMOS circuits is a transistor. Given success of CMOS, one may find it thus natural to mimic the transistor functionality using spin waves. A conventional transistor can act both as a switch as well as an amplifier and shows nonlinear characteristics. Spin-wave transistors thus typically employ nonlinear effects beyond the linear small-signal approximation [169], [170], [257], [258]. A proposal of a nonlinear spin-wave transistors has been published in [61]. They are based on nonlinear interactions of spin waves propagating in a waveguide from "source" to "drain" with spin waves that are injected in a "gate" section of the waveguide. The presence of spin waves in the gate modulates the spin-wave transmission along the "channel" via four-magnon scattering. To optimize the modulation and to confine the spin waves in the gate, the central section of the transistor consists of a magnonic crystal. Recently, a "linear" transistor that does not require nonlinear interactions between spin waves has been demonstrated [259]. In this device, spin waves propagate in a waveguide from source to drain and interfere constructively or destructively with spin waves with variable phase from the gate. In this way, the spin-wave flow from source to drain can be modulated by the gate spin waves. The modulation of spin-wave transmission between source and drain by spin-wave injection into the gate allows for the operation of such a device as a switch. In contrast, the proposed spin-wave transistors show no (or at best weak) gain and thus cannot be operated as amplifiers, which complicates their usage in spin-wave circuits. Together with the rather weak modulation of the spin current (well below the typical on–off current ratios of 10<sup>6</sup> in CMOS transistors), this entails that spin-wave transistors are no direct alternative to CMOS transistors. Nevertheless, the spin-wave transistor prototype [61] opened a new research avenue for all-magnon data processing. In this concept the spin-wave nonlinearity is used to process as much information as possible in the magnetic system instead of conversion of spin-wave energy in electric signals after each gate. This approach was used for the realization of a directional coupler based on spin waves [260], and a first integrated magnonic circuit in a form of a half-adder [178]. # **2.5.2.** SPIN-WAVE LOGIC GATES Conventional logic CMOS circuits are not designed directly on a transistor level but rather constructed based on a set certain universal building blocks (standard cells), such as *e.g.* NAND or NOR logic gates or SRAM cells. Therefore, it is interesting to develop an equivalent set of spin-wave-based logic gates. As argued above, constructing logic gates from spin-wave transistors does currently not appear promising. A better approach is the design of logic gates using the interference-based paradigm. Different concepts for the implementation of spin-wave logic gates have been proposed, using the different encoding schemes. A main advantage is that these gates are linear passive devices and do not require any energy beyond the energy in the spin waves themselves, which renders such approaches promising for ultralow-power computing applications assuming that the involved spin waves can be efficiently excited. #### **INVERTERS AND PHASE SHIFTERS** Before discussing more complicated logic gates, it is instructive to review inverter concepts for different encoding schemes. The simplest inverter is obtained by using phase encoding since in this case, logic inversion corresponds simply to a phase shift of $\pi$ . Such a phase shift can be achieved by propagation in a waveguide with a length of $L = \left(n - \frac{1}{2}\right) \times \lambda$ with $\lambda$ the spin-wave wavelength and $n = 1, 2, 3, \ldots$ an integer. The advantage of such inverters is that they are passive and do not require additional external power. In addition, phase shifting concepts can be based on the local modification of the spin-wave dispersion relation. Such inverters can potentially be even more compact than delay lines [261]–[263]. Local changes in saturation magnetization or waveguide width can lead to a local change in wavelength, leading to an additional phase shift with respect to an unperturbed waveguide. Alternatively, external magnetic bias fields can also be used, including effective fields generated by magnetoelectric effects or VCMA, which promise to be more energy efficient than Oersted fields generated by a current. An advantage of such concepts is that they can be reconfigurable, *e.g.* when a VCMA capacitor is used to generate the effective magnetic field. Magnonic crystals can also be used to generate phase shifts and invert a phase-coded signal. A disadvantage is the more complex device structure as well as potentially the required additional power, *e.g.* when an electromagnet is used. A highly beneficial property of such inverters is that they do not need to be separate logic gates but can be integrated in the design of e.g. the spin-wave majority gates discussed below. Extending the length of an input or output waveguide by $\frac{\lambda}{2}$ renders the input or output inverting. In general, this can be expected to reduce the size of spin-wave circuits considerably. In case of amplitude level encoding, inverters can be obtained by interference with a reference wave of phase $\pi$ . For a suitably chosen geometry, the reference wave interferes destructively with a potential signal wave. If a wave is present, its amplitude is reduced to zero, *i.e.* an output of 0 is obtained for an input of 1. For an input of 0, the reference wave reaches the output, leading to a logic 1. Such inverters are not passive, unlike the above delay lines, and therefore require additional power to generate the reference wave. #### AMPLITUDE LEVEL ENCODING: LOGIC GATES BASED ON INTERFEROMETERS Initial work on spin-wave logic gates has mainly focused on amplitude level coding in combination with a device design based on an analog of a Mach—Zehnder interferometer [58], [62], [81], [87], [264]. In such a spin-wave interferometer, an incoming spin wave is split into two waves in the interferometer arms. A current flowing through a wire perpendicular to the plane of the interferometer generates an Oersted field, which leads to a relative phase shift of the spin waves in the two interferometer arms. Subsequently, the waves are recombined and interfere. The relative phase shift, and therefore the amplitude of the output wave, depends thus in an oscillatory way on the current in the wire. This approach can be used to design different logic gates, such as XNOR, NOR, or NAND. It should be mentioned that such logic gates are inherently hybrid devices since input signals are encoded in currents whereas output signals employ spin waves for information encoding. For logic gate operation, the parameters are chosen so that an input current leads to destructive spin-wave interference in the interferometer (logic 0), whereas no current leads to constructive interference (logic 1). Additional interference between spin waves emanating from different interferometers can in principle be used for more complex logic gates or circuits. Alternative proposals use voltages rather than currents, *e.g.* via VCMA or magnetoelectric effects, to modulate the spin-wave phase during propagation [87], [264]. Several logic gates—*e.g.* NOT, NAND, or XNOR—have been demonstrated experimentally [58], [81]. Device sizes were a few mm. Since the device operation is based on Oersted fields generated by *currents*, scaling the devices leads to a strongly increasing current densities in the wires and to reliability (*e.g.* electromigration) issues. If the distance between the wire and the waveguide is also scaled, a part of the increase in current density can be avoided. Nonetheless, such current-based devices scale significantly worse than devices operating with voltages or current densities. In addition, the hybrid character of the logic gates leads to cascading issues since the output of a logic gate (spin-wave amplitude/intensity) cannot be used as an input for a subsequent gate, which requires encoding in a current. Therefore, practical spin-wave circuits entail additional electric circuits for signal conversion. # PHASE ENCODING: SPIN-WAVE MAJORITY GATES Beyond the initial hybrid devices, recent work has focused on spin-wave logic gates that encode both input and output signals in spin waves. Conventional AND and OR logic gates have been demonstrated using colinear[83], [85] or cross junction[92] geometries. Multivalued logic gates have also been proposed by combining phase and amplitude coding [102], [229]. The most studied device is however the spin-wave majority gate, originally proposed by Khitun and Wang [59]. Majority gates have recently elicited much interest due to potential reductions of circuit complexity with respect to conventional Boolean-based circuit design. It is rather natural to employ phase encoding for spin-wave majority gates since the interference of three (or any other larger odd number) input waves with phases 0 or $\pi$ generates an output wave with the phase that corresponds to the majority of the input waves. Spin-wave majority gates consist in general of transducers and input waveguides that provide input spin waves to the logic gate, a region where the spin waves can interfere, and an output port where the phase of the output wave is detected or transferred to an input waveguide of a subsequent gate. The input spin waves must have the same wavelength $\lambda$ and amplitude in the interference region. When the amplitudes of the three spin waves decay differently during propagation, it may be necessary to compensate for the unequal decay at the input level. For correct operation, the spin waves representing the same logic level need to be in phase at the output. This is best realized in logic gates, in which the path lengths of the three spin waves between their respective inputs and the output, $D_i$ (i=1,2,3), differ only by integer multiples of $\lambda$ , i.e. $D_i-D_j=n\times\lambda$ with $n=0,1,2,\ldots$ Such "resonant" conditions are preferred since they allow for the utilization of the same input phases for all three waves. When such conditions are not met, the spin waves accumulate different phases during propagation to the output port, which need to be compensated for at the transducer or external signal level. Alternatively, an inverting input $I_i$ can be obtained when the path length of the corresponding spin wave, $D_i$ , is extended or shortened so that the spin wave accumulates an additional phase of $\pi$ with respect to the others, i.e. $D_i - D_j = \left(n - \frac{1}{2}\right) \times \lambda$ with $n = 1, 2, 3, \ldots$ Moreover, shifting the output port by the same distance leads to an inverted output signal $\overline{\text{MAJ}}$ , i.e. to an inverted logic majority (or "minority") function. This indicates that inverters do not have to be distinct logic gates as in the case of CMOS but can be integrated into the majority gate design in a straightforward way. The initial proposals of spin-wave majority gates were based on a trident-shaped (also referred to as $\Psi$ -shaped) device layout [59], [89], [90], [196]. In this layout, three parallel input waveguides are combined into a single output waveguides in a region where the spin waves interfere. It should be kept in mind that the three waveguides are generally not equivalent and thus the lengths of the trident prongs must be adapted to the spin-wave wavelength and the relative phase shifts that are accumulated during propagation [89]–[91]. Reducing the dimensions of such a structure to the nanoscale requires careful design and parameter selection to avoid strong spin-wave attenuation at the bends of the trident [89], [91]. As discussed above, using forward volume spin waves in devices with perpendicular magnetization can alleviate these constraints [77], [90], [91], [265]. The operation of a trident-shaped spin-wave majority gate has been demonstrated experimentally at the mm scale using YIG waveguides [93], [265]. The phase of the output wave was extracted from time-domain measurements and used to assemble the full truth table of the majority function. These proof-of-concept demonstrations clearly indicate the feasibility of the approach. However, to become competitive with CMOS, these gates need to be miniaturized to the nanoscale and their throughput needs to be improved, *e.g.* by selecting different spin-wave configurations with high group velocity. To tackle the scaling challenge, colinear (inline) designs of majority gates have been proposed, which are more compact, more scalable, and easier to fabricate than the trident shaped gates [63], [83], [85], [198], [266]. In inline majority gates, spin-wave transducers are placed along a single straight waveguide [267]. When the transducer distance $d_t$ is equal to an integer multiple of the spin-wave wavelength $\lambda$ , *i.e.* $d_t = n \times \lambda$ with $n = 1, 2, 3, \ldots$ , in-phase electrical signals at the transducers generate in-phase spin waves throughout the device, which is ideal for spin-wave interference. Based on the position of the output port, both a majority gate or, after additional propagation over $\frac{\lambda}{2}$ , an inverted majority (minority) gate can be obtained. The output port can also be positioned between the input ports, which renders the design reconfigurable [266], [268]. The operation of an inline majority gate has been recently demonstrated experimentally using CoFeB as the waveguide material and surface spin waves with high group velocity [266], [268]. This approach has also allowed for the scaling of the waveguide width down into the sub- $\mu$ m range [268]. # **2.5.3.** Spin-wave amplifiers and repeaters In addition to logic devices, spin-wave circuits may also require "auxiliary" elements, such as repeaters or amplifiers. As discussed above, spin waves have a lifetimes of ns to $\mu$ s and thus lose energy during computation or information transfer. Spin-wave amplifiers are thus crucial to compensate for such losses. Similarly, propagation losses can be compensated for by repeaters, which are devices that receive signals and retransmit them. Amplifiers and active repeaters can also provide gain in otherwise passive linear interference-based logic circuits. The amplification of spin-wave signals can be realized by different mechanisms. In principle, the transducer concepts can also be used for amplification. The spin-wave signal can be enhanced by decreasing the magnetic damping in a waveguide using STT or SOT [269] generated by a DC current. Alternatively, spin waves can be amplified parametrically though a temporally periodic variation of a system parameter. For spin waves, two cases of parametric amplification can be distinguished: (i) parallel and (ii) perpendicular pumping. Perpendicular parametric pumping is often described in terms of multi-magnon (three- or four-magnon) scattering processes. This process requires the generation of large-amplitude spin waves to reach the nonlinear regime and is therefore potentially not energetically efficient for logic applications. In the case of parallel pumping, the spin-wave signal can be amplified by generating an alternating magnetic field with twice the spin-wave frequency parallel to the longitudinal component of the magnetization. This can e.g. be realized using inductive antennas [180], [197], [234], [235], [237], [270], but also STT [236], VCMA [87], [264], or magnetoelectric effects [230], [232], which intrinsically support the coupling to the longitudinal component of the magnetization. The similarity between transducers and amplifiers has the advantage that these components do not require very different integration schemes to be embedded in the same circuit and chip. Spin-wave repeaters are an alternative to amplifiers and can provide additional memory or clocking functionality [213]. As an alternative, the use of nanomagnets with canted magnetic anisotropy has been proposed [59], [63]. For suitably designed devices, spin waves propagating in a waveguide can switch a nanomagnet in a magnetoelectric element when synchronized electric signals are applied to the latter. Based on the orientation of the magnetization of the nanomagnet, spin waves can then be re-emitted into the waveguide by a second clock cycle. In this way, a spin-wave signal can be transferred from one stage to the next within a clock cycle. Micromagnetic simulations have indicated that the relative phase of the incoming and outgoing spin wave can be controlled. Such repeaters can compensate for losses or even provide gain, as well as regenerate and normalize spin-wave signals. # 2.5.4. SPIN-WAVE MULTIPLEXERS A multiplexer is a device that selects from several analog or digital input signals and forwards the chosen one to a single output line. Multiplexers are mainly used to increase the amount of data that can be sent over a network with a fixed bandwidth. Conversely, a demultiplexer is a device that disentangles a single input signal into several output signals. Parallel data transmission can *e.g.* be enabled using different (spin-wave) frequencies in frequency-division multiplexing. Several approaches have been reported for the realization of a spin-wave (de-)multiplexer. A number operates by guiding spin waves into one arm of Y- or T-shaped structures by controlling the magnetization using magnetic fields [271], [272], including current-induced local magnetic field control [105]. A drawback of these approaches is that they increase the power consumption. In contrast, passive devices, which do not require electric currents, may offer much lower energy consumption. Two proposals for such passive (de-)multiplexers have been published to date. The first one is based on the directional spin-wave couplers [178], [260]. The second one is based on the utilization of caustic spin-wave beams [273], [274]. Such caustic beams are nondiffractive spin-wave beams with stable subwave-length transverse aperture [275] and are a consequence of the strong anisotropy of the spin-wave dispersion relation in in-plane magnetized films. In an anisotropic medium, the direction of the group velocity does not generally coincide with the direction of the phase velocity and the wavevector. For sufficiently strong anisotropy, the direction of the group velocity can become independent of the wavevector in a certain part of the spectrum. In such a case, wave packets excited with a broad (angular) spectrum of wavevectors in the specific part of the dispersion relation are channeled along the direction of the group velocity [273]–[275]. These caustic beams are linear and do not interact with each other, allowing in principle for the realization of complex two-dimensional spin-wave networks in unpatterned magnetic films. These effects have been used to route spin waves in unpatterned thin magnetic films. The direction of such beams depends on the spin-wave frequency and can be controlled by an external magnetic field. Thus, caustics can selectively transfer information encoded in spin waves. The frequency dependence of the phenomenon was successfully used to realize multiplexer and demultiplexer functionalities first by micromagnetic simulations [273] and recently experimentally [274]. The device consists of a 30 nm thick narrow CoFeB waveguide as input and two output waveguides. In the unpatterned central part of the device, caustic beams are propagating under different angles for different spin-wave frequencies. As a result, the spin-wave intensity is transferred to different output waveguides, depending on the frequency. This behavior can be used to separate information encoded in spin waves at different frequencies in frequency-division multiplexing schemes to enhance the computational throughput. In provides an "all-magnonic" alternative to demultiplexing in the electric domain after detection of the complex multifrequency signal by the output transducer, leading to reduced bandwidth requirements at individual output ports. #### 2.5.5. UNCONVENTIONAL AND ANALOG COMPUTING APPROACHES Beyond digital spin-logic circuits and wave computing systems, spin-wave-based "unconventional" and analog circuits have also been proposed. While less universal than digital systems, these concepts take particular advantage of the wave nature of spin waves and can be very efficient for specific tasks such as signal and data processing [99], [108], [112], prime factorization [106], [276], or Fourier transforms [103], Pioneering work on wave-based computing in the 1970s and 1980s has used photons to develop optical computers [19], [20], [24]. While optical data communication is today ubiquitous, optical computing has not become competitive with CMOS. The challenges of optical computing overlap with those of spin-wave computing and the realization of competitive optical computers has been hindered by difficulties to confine photons at ultrasmall length scales and the power efficiency at the transducer level [24], [277]. Nonetheless, both digital and analog computing concepts have been developed and the work on optical computing has inspired spin-wave computing [103]. An example for a analog computing architecture is the magnonic holographic memory. It consists of a two dimensional network of crossing waveguides with transducers for spin-wave excitation and detection at the edges [97]–[99], [214]. After spin waves have been excited, they propagate through the structure, interfere with each other, and generate an interference pattern in the network. In such a structure, all inputs directly affect all outputs, which can be used for parallel data processing [99], [103], [108], [112], [186]. Cellular nonlinear networks are structurally similar to magnonic holographic memories and consists also of an array of magnetic waveguides [186]. In contrast, active transducers at every waveguide crosspoint can be used to locally manipulate the magnetization. Wave superposition and interference can again be used for parallel data or image processing [106], [112], [278]. Spin waves can also be employed for the design of reversible logic gates [111]. Here, both reversibility of the logic operation as well as of the physical processes are used to perform ultralow energy operations. Moreover, several spin-wave-based concepts for neuromorphic computing have been proposed [99], [113], [186]–[188], [279], [280]. Finally, the asymmetric propagation and nonlinear behavior of spin waves renders them promising candidates for reservoir computing [189]–[191]. #### 2.5.6. THREE-DIMENSIONAL MAGNONICS The spin-wave devices are based on films and multilayers that are prepared by thin film deposition techniques and lithographically patterned into the desired structures. Hence, the resulting structures are all planar and two-dimensional. Recently, research to extended the planar structures into the third dimension has intensified [281], and several proof-of-concept experiments have been demonstrated [282], [283]. The fabrication of such three-dimensional structures was enabled by the recent advances in focused electron beam induced deposition (FEBID) [284]. FEBID is a promising three-dimensional direct-write nanofabrication technique [284], [285], which opens prospects to building magnonic three-dimensional nanoarchitectures with complex interconnectivity and the development of novel types of human brain-inspired neuromorphic networks using spin waves. In addition, the ease of area-selective tuning of the magnetization in spin-wave conduits via their postgrowth irradiation with ions [286], or electrons [287], or the proximity to superconductors [288] opens pathway to the fabrication of spin-wave circuits with graded refractive index for the steering of spin waves in curved waveguides or into the third dimension. # 2.5.7. TOWARDS QUANTUM MAGNONICS One of the prominent advantages of magnonics is the possibility to exploit complex data processing concepts at room temperature. Nevertheless, in recent years, increasing attention has been devoted to the behavior of spin waves at cryogenic temperatures for two reasons. First, the physics of hybrid superconductor-ferromagnet structures provides access to fascinating new physics that may potentially be exploited for data processing or quantum computing. Second, decreasing the temperature below 100 mK leads to the freeze-out of thermal magnons, which enables experiments with single magnons. Thus, such conditions give access to quantum magnonics. The combination of ferromagnetism and superconductivity in hybrid ferromagnet / superconductor (F/S) systems leads to emerging physical phenomena. For instance, in proximity-coupled S/F/S three-layers, a substantial reduction of the ferromagnetic resonance field is attributed to the generation of unconventional spin-triplet superconductivity [289]. It has been demonstrated that coupling of spin waves in F with S results in an enhanced phase velocity of the spin waves due to the Meissner screening of AC magnetostatic stray fields by S [290]. Several novel effects emerge for proximity-decoupled S/F hybrids in out-of-plane magnetic fields [291]. When the S layer is in the mixed state, an external magnetic field can penetrates in the form of a lattice of Abrikosov vortices (fluxons). The stray fields emanating from the vortex cores produce a periodic modulation of the magnetic order in F, such that the S/F bilayer can be viewed as a fluxon-induced magnonic crystal. It has been shown that the Bragg scattering of spin waves on a flux lattice moving under the action of a transport current in the S layer is accompanied by Doppler shifts [291]. An additional promising research direction is related to the experimental examination of a Cherenkov-like radiation of spin waves by fast-moving fluxons when the vortex velocity exceeds a threshold value [292]. To prevent instability and the collapse of vortices at the velocity of required 5-15 km/s, one can use, e.g., superconductors with fast relaxation of disequilibrium [293]. Hybrid systems based on superconducting circuits allow also for the engineering of quantum sensors that exploit different degrees of freedom. Quantum magnonics [294]–[299], which aims to control and read out single magnons, provides opportunities for advances in both the study of spin-wave physics and the development of quantum technologies. The detection of a single magnon in a millimeter-sized YIG crystal with a quantum efficiency of up to 0.71 was reported recently [294]. The detection was based on the entanglement between a magnetostatic mode and the qubit, followed by a single-shot measurement of the qubit state. The strong coupling of magnons and cavity microwave photons is one of the routes towards quantum magnonics, which is intensively explored nowadays [295]–[301]. In addition to single-magnon operations expected to be realized at mK temperatures, macroscopic quantum states such as magnon Bose-Einstein Condensates (BECs) at room temperature have also been considered as potential data carriers. The fundamental phenomenon of Bose-Einstein condensation has been observed in different systems of both real particles and quasiparticles. The condensation of real particles is achieved through temperature reduction while for quasiparticles like magnons, a mechanism of external boson injection by irradiation is required [302], [303], or as demonstrated recently, a rapid-cooling mechanism can be exploited [304]. Moreover, a supercurrent in a room-temperature Bose-Einstein magnon condensate was demonstrated experimentally [305]. The observation of a supercurrent confirms the phase coherence of the observed magnon condensate and may be potentially used in future magnonic devices for low-loss information transfer and processing. # 2.5.8. SPIN-WAVE SENSORS The on-chip integrability and miniaturization of spin-wave devices can be also be employed for magnetic field sensing applications. CMOS compatible magnetic sensors play a crucial role in a variety of industries, including the automotive industry, biomedical applications, navigation, robotics, *etc*. Especially magnetoresistive sensors [40], [306], [307], based on anisotropic magnetoresistance, giant magnetoresistance, or tunnel magnetoresistance, have found widespread commercial application due to their high sensitivity as well as low noise and power consumption [40], [306], [307]. Recently, several pioneer investigations have been performed to explore the possibility to use spin waves for magnetic sensors [308]–[314]. In particular, magnonic crystals, periodic magnetic structures, have been proposed as sensors with very high sensitivity [308], [309], [312], [313]. Magnonic crystals have also been used for the sensing of magnetic nanoparticles [310]. Finally, magnon polaritons in PT-symmetric cavities have been proposed for sensors with very high sensitivity [311]. Such miniature sensor applications share many properties of the logic circuits and may also strongly benefit from optimized spin-wave transducers and read-out circuitry. #### 2.5.9. MICROWAVE SIGNAL PROCESSING To date, commercial applications of ferromagnetic resonance and spin waves mainly include macroscopic tunable microwave filters, power limiters, circulators, or gyrators based on ferrite materials, especially low-damping YIG [315], [316]. Much research has been devoted to such devices between the 1960s and 1980s [317]–[321]. Several devices are today commercially available, although typically for niche applications. These devices employ typically magnetic elements in the mm size range. For such large quantities of magnetic material, the microwave absorption by ferromagnetic resonance or spin waves is large, leading to efficient power conversion between electric (microwave) and magnetic domains. Reducing the amount of magnetic material in scaled devices degrades the power conversion efficiency and lead to similar issues that need to be overcome for nanoscale logic circuits. Therefore, advances in spin-wave transducer technol- ogy may additionally enable nanoscale analog microwave applications with interesting prospects for telecommunication. More recently, increasing interest has been devoted to magnetoelectric antennas. Conventional dipolar antennas are difficult to scale due to the large wavelength of electromagnetic waves in air [322], [323], and often suffer from losses due to near-field interactions with the environment[324], [325]. Lately, an alternative antenna type based on magnetoelectric composites has been proposed [326], [327], which consists of a piezoelectric magnetostrictive bilayer. Applying a microwave signal to such an antenna produces an oscillating magnetic dipolar field, which acts as a source of electromagnetic radiation [328]–[330]. The response can be enhanced by acoustic and magnetic resonances. Due to the much shorter wavelengths of acoustic and magnetic waves at microwave frequencies, magnetoelectric antennas can be more compact that conventional dipolar antennas and may require less power [326], [331], [332]. #### 2.5.10. Antiferromagnetic magnonics and terahertz applications In recent years, antiferromagnetic spintronics have received increasing attention as an extension of established spintronic approaches based on ferromagnets or ferrimagnets [333]–[335]. The spin-wave frequencies in antiferromagets are in the THz range [336]–[339]; and therefore antiferromagnetic magnonics are of interest for THz applications [340], [341]. In principle, antiferromagnetic media may conceptually enable spin-wave logic at THz frequencies with prospects of better scalability and higher operating speed [60]. However, methods of controlling and detecting magnetic excitations in antiferromagnets are only emerging [342]–[345]. To date, logic devices utilizing antiferromagnetic spin waves have not been demonstrated yet. In particular the controlled excitation and the detection of phase-coherent THz spin waves in antiferromagnetic waveguides is still lacking, as are concepts to efficiently generate THz logic signals by CMOS circuits. Yet, if fundamental research on antiferromagnetic spintronics continues at a fast pace, spin-wave logic at THz frequency may become an interesting alternative to the GHz approaches based on ferromagnetic media. # 2.6. CONCLUSIONS In conclusion, we discussed the SW creation as a collective spin excitation within a ferromagnetic material by means of an external magnetic field. Subsequently, we introduced the SW based computing basic principles, possible ways for information encoding and processing, and demonstrated that SW interaction provides natural means for Majority gate and Inverter realizations, which form together a Universal Gate Set. Afterwards, we discussed the generic structure of spin wave device, the design of directional coupler, which are used mainly for spin wave normalization in this thesis, followed by an overview of the state-of-the-art. # FANOUT ENABLE SPIN WAVE MAJORITY GATES - 1.1. LADDER SHAPE STRUCTURE - 1.2. TRIANGLE SHAPE STRUCTURE - 1.3. Performance Evaluation - 1.4. CONCLUSIONS To enable waveguide utilization as a local interconnect, SW gates must possess fan-out capabilities, which is not the case for previously proposed state-of-the art SW gates [58], [59], [62], [77], [80]–[96]. In this chapter, we address this issue and introduce: (i) ladder shaped 3-input majority (MAJ3) SW gates with a fan-out of 2 and 4, (ii) a Programmable Logic Gate (PLG) structure with a fan-out of 1, 2, 3, and 4, and (iii) triangle shaped MAJ3 and 2-input XOR gates. In addition, we present the validation of these fanout enabled logic gates, and discuss the performance evaluation and comparison state-of-the-art. This chapter content is based on the following publications: **A. Mahmoud**, C. Adelmann, F. Vanderveken, S. Cotofana, F. Ciubotaru and S. Hamdioui, *Fan-out of 2 Triangle Shape Spin Wave Logic Gates*, 2021 Design, Automation and Test in Europe Conference and Exhibition (DATE), 2021, pp. 948-953. **A. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui and S. Cotofana, *4-output Programmable Spin Wave Logic Gate*, 2020 IEEE 38th International Conference on Computer Design (ICCD), 2020, pp. 332-335. **A. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana and S. Hamdioui, *2-Output Spin Wave Programmable Logic Gate*, 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2020, pp. 60-65. **A. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, *Fan-out enabled spin wave majority gate*, AIP Advances **10**, 035119 (2020). Figure 3.1: Fan-out of 2 MAJ3 Gate. # 3.1. LADDER SHAPE STRUCTURE We presents the ladder shape Majority gate and programmable logic gate structure in this section. In addition, we discuss the OOMMF simulations results of the proposed gates. # **3.1.1.** FANOUT ENABLED SPIN WAVE MAJORITY AND PROGRAMMABLE LOGIC GATES The MAJ3 gate with a ladder-shaped structure is presented in Figure 3.1. The inputs are excited at $I_1$ , $I_2$ , $I_3$ , and $I_4$ , and the outputs are read from $O_1$ and $O_2$ . To obtain a proper interference pattern at the cross points, the waveguide width w has to be less than or equal to the wavelength $\lambda$ . Also, the excited SWs should have the same amplitude A. In addition, all excited SWs are required to have the same frequency to achieve the desired interference pattern. The proposed device layout is generic and its dimensions and the critical distances $d_i$ (where $i=1,2,\ldots,7$ ) are expressed in terms of spin wave wavelengths. For example, if $\lambda$ wavelength SWs have to constructively interfere when they have the same phase and destructively otherwise, $d_1$ , $d_2$ , $d_3$ , $d_4$ , and $d_5$ must be equal with $n\lambda$ , $(n=0,1,2,3,\ldots)$ . If the opposite behaviour is targeted, $d_1$ , $d_2$ , $d_3$ , $d_4$ , and $d_5$ must be equal with $(n+1/2)\lambda$ . Moreover, to obtain a proper fan-out of 2, i.e., outputs with the same energy levels, the structure has to be symmetric, thus $d_1$ should be equal with $d_2$ , $d_3$ , $d_4$ , $d_5$ , $d_6$ and $d_7$ . However, if the fanout of 2 is not required, the dimensions can have different values. In contrast with CMOS gates, SW gates can provide both direct and inverted outputs by properly adjusting the output transducer position versus the output interference point. In this way, the direct and inverted results can be read at a distance of $n\lambda$ and $(n+1/2)\lambda$ from the last interference, respectively. In our case, MAJ( $I_1,I_2,I_3$ ) and MAJ'( $I_1,I_2,I_3$ ) are obtained at $d_6=d_7=n\lambda$ , and $d_6=d_7=(n+1/2)\lambda$ , respectively, while they both exhibit the same energy due to structure symmetry. Note that the gate parallelly evaluates MAJ( $I_1,I_2,I_3$ ) and MAJ( $I_1,I_2,I_4$ ), which means that a fanout of 2 is achieved in case $I_3=I_4$ . Intuitively speaking, the Majority gate operates as follows: (i) SWs with Figure 3.2: Fan-out of 4 MAJ3 Gate. appropriate phase values representing the targeted logic values (logic 0 or 1) are initiated at $I_1$ , $I_2$ , $I_3$ , and $I_4$ . (ii) The excited SWs propagate (in both directions in the horizontal and vertical waveguides) and interfere when meeting each other. The resulting waves propagate towards the outputs $O_1$ and $O_2$ . Thanks to device symmetry and SWs isotropic behaviour in this configuration, the waves arriving at gate outputs are identical, thus the 3-input Majority gate exhibits a fan-out of 2. It is worth-mentioning that while $I_3$ is mostly contributing to $O_1$ it also influences $O_2$ as the spin-wave signal excited at $I_3$ propagates through $I_1$ and $I_2$ . The same holds true for $I_4$ 's effect on $O_1$ as the spin-wave signal excited at $I_4$ propagates through $I_1$ and $I_2$ . While this is not an issue when $I_3 = I_4$ proper design precautions are required to minimize these effects when the gate is utilized to evaluate two different majority functions in parallel. In addition, spin wave excited at $I_1$ and $I_2$ face edges while propagating towards the outputs while $I_3$ and $I_4$ have straight path to output. Therefore, $I_3$ and $I_4$ should be excited at a lower energy level than $I_1$ and $I_2$ . Once the resulted SWs reach the outputs, they can be interpreted by means of: (i) Phase Detection (PD) or (ii) Threshold Detection (TD). Depending on a predefined phase, PD is performed as follows: a 0 SW phase corresponds to a logic 0 and a phase of $\pi$ to logic 1. For TD the SW Magnetization Spinning Angle (MSA) is measured and compared with a predefined threshold value such that if the MSA value is larger than the threshold a logic 1 is reported and a logic 0 otherwise. We note that if only one MAJ3 output is required the structure can be simplified: (i) physically, by removing one of its vertical waveguides (arms) or (ii) logically, by not providing an input signal to $I_4$ . More- Figure 3.3: Balanced Energy Fan-out of 4 MAJ3 Gate. over, the gate fan-out capabilities can be extended beyond 2 by vertically lengthening its arms. For example, if the outputs in Figure 3.1 are shifted downward and two extra outputs are placed on top of I<sub>3</sub> and I<sub>4</sub> inputs the new structure can accommodate 4 outputs as indicated in Figure 3.2 and, if properly designed, the gate can provide a fan-out of 4. Note that by making use of $I_3$ and $I_4$ as control signals instead of data inputs the structure can deliver fanout of 4 (N)AND, and/or (N)OR gate behaviour if the outputs are phase read. If threshold-based output reading is utilized X(N)OR functionality can be delivered at O1 and O2. However, this X(N)OR functionality cannot be obtained at O3 and O4 because they receive amplitude unbalanced SWs due to the fact that I3 and I4 are closer to $O_3$ and $O_4$ than to $O_1$ and $O_2$ . The unbalance SW amplitude causes output energy changes and impede reliable threshold-based output reading. Thus, due to the lack of symmetry the 4 outputs are not fully equivalent in terms of computation capabilities. To circumvent this limitation, we proposed the symmetric energy balanced 4-input Programmable Logic Gate (PLG) depicted in Figure 3.3. To equalize output energies and be able to capture all possible logic function results at all outputs, we relocate the control inputs in the middle of the vertical waveguide such that each gate input is located at the same distance from the four gate outputs. Therefore, the waves propagate towards $O_1$ , $O_2$ , $O_3$ , and $O_4$ on equal length paths, which means that they rich the outputs with the same (amplitude) energy. The previously described design procedure is in place and all logic functions are achievable at each output. An extra advantage of this structure is that when computing the same function, it can provide a clean maximum fan-out of 4, or when computing 2 functions each of them can be produced with a fan-out of 2. #### 3.1.2. SIMULATION SETUP AND RESULTS The three structures are validated by means of OOMMF [194] simulations and the results are discussed in the following lines. We validate the proposed logic gate structures Figure 3.4: Fan-out of 2 MAJ3 Gate OOMMF Simulation. Figure 3.5: 2-output AND/OR Gate OOMMF Simulation. by means of micromagnetic simulations while making use of $Fe_{60}Co_{20}B_{20}$ waveguides, with a Perpendicular Magnetic Anisotropy (PMA) field greater than the magnetic saturation, which means that no external magnetic field is required for proper gate operation. We instantiated a MAJ3 gate for waveguide width $w=50\,\mathrm{nm}$ . To simplify the interference pattern, we selected a SW wavelength larger than w, $\lambda=110\,\mathrm{nm}$ , which implies that $d_1=d_2=d_3=d_4=d_5=d_6=d_7=110\,\mathrm{nm}$ . Further, we assume the following values of the relevant parameters: magnetic saturation $M_s=1.1\,\mathrm{MA/m}$ , exchange stiffness $A_{ex}=18.5\,\mathrm{pJ/m}$ , damping constant $\alpha=0.004$ , perpendicular anisotropy constant $k_{ani}=0.83\,\mathrm{MJ/m^3}$ , and waveguide thickness $t=1\,\mathrm{nm}$ [346]. We calculated the Forward Volume Magnetostatics Spin Wave (FVMSW) dispersion relation for these parameters, and for $\lambda=110\,\mathrm{nm}$ , and $k=2\pi/\lambda=50\,\mathrm{rad/}\mu\mathrm{m}$ , we determined a spin wave frequency f of 9 GHz. Figure 3.4 presents OOMMF simulation results for the proposed w = 50 nm MAJ3 gate, under all possible input combinations. Note that in the Figure, blue presents logic 1 (i.e., phase of $\pi$ ), red presents logic 0 (i.e., phase 0). If $I_1 = I_2 = I_3 = 0$ or the majority of the inputs are 0 then $O_1 = O_2 = 0$ (red), whereas if the majority of the inputs are 1, then the outputs $O_1$ and $O_2$ are 1 (blue), as expected. Figure 3.5 presents the simulation results when $I_3$ and $I_4$ are utilized as control inputs ( $I_3 = I_4 = 0$ and $I_3 = I_4 = 1$ ) in order to achieve 2-input AND/OR instead of MAJ3 functionality. One can observe in the Figure that $O_1 = AND(I_1, I_2)$ and $O_2 = OR(I_1, I_2)$ thus the gate simultaneously evaluate the two Boolean functions: $O_1 = 0$ for the input combinations $\{0,0\}$ , $\{0,1\}$ , and $\{1,0\}$ , and $O_1 = 1$ for $\{1,1\}$ ; $O_2 = 1$ for the input combinations $\{0,1\}$ , $\{1,0\}$ , and $\{1,1\}$ and $O_1 = 0$ for $\{0,0\}$ . Also, it can be noticed that by shifting the output reading position by $\lambda/2$ , the inverted version of the output can be read; thus, AND, NAND, OR, and NOR functionalities can be Table 3.1: Normalized MSA AND/OR Gate. | Ca | ses | $O_1/I_1$ | $O_2/I_1$ | | |-------------|-------|-----------|-----------|------| | $I_3 = I_4$ | $I_2$ | $I_1$ | | | | 0 | 0 | 0 | 1 | 1 | | 0 | 0 | 1 | 0.28 | 0.28 | | 0 | 1 | 0 | 0.37 | 0.37 | | 0 | 1 | 1 | 0.45 | 0.45 | Figure 3.6: 4-output (a) AND/OR Gate (b) AND/AND Gate OOMMF Simulation. obtained. In addition, Table 3.1 presents $O_1$ and $O_2$ normalized Magnetization Spinning Angle (MSA) values for $I_3=I_4=0$ and all possible $I_1$ and $I_2$ input combinations. The MSA values in the Table are normalized with respect to the highest achievable magnetization, which in this case is obtained for $\{I_1, I_2\} = \{0, 0\}$ . Note that the results for the other possible control input combinations, i.e., $\{I_3, I_4\} = \{0, 1\}, \{I_3, I_4\} = \{1, 0\}, \text{ and } \{I_3, I_4\} = \{1, 1\}, \text{ are } \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1\}, \{1, 1$ similar to those obtained for $\{I_3, I_4\} = \{0, 0\}$ . The basic idea behind the threshold-based output value interpretation is to define an appropriate threshold and, e.g., classify the gate output as 0 if its magnetization value is larger than threshold, and 1, otherwise. By applying this principle on Table 3.1 values and choosing 0.41 as magnetization threshold, the gate outputs are both 0 if $\{I_1, I_2\} = \{0, 0\}$ and $\{I_1, I_2\} = \{1, 1\}$ , and 1, otherwise, which captures the XOR functionality. If the detection rule is changed such that logic 1 is reported when the normalized MSA is larger than the magnetization threshold, and logic 0, otherwise, the proposed structure evaluates an XNOR function. Thus, in this case the output reading location is not relevant as the inverted version of the output is obtained by switching the thresholding rule. Figure 3.6(a) and (b) present OOMMF simulation results for the 2-input 4-output AND/OR and AND/AND gates, respectively, for all possible $\{I_1, I_2\}$ combinations. One can easily observe in Figure 3.6(a) that the left arm provides an AND gate functionality at outputs $O_1$ and $O_3$ , whereas the right arm provides an OR gate functionality at outputs $O_2$ and $O_4$ . E.g., $\{I_1, I_2\} = \{0, 0\}, \{0, 1\}, \{1, 0\}$ results in $O_1 = O_3 = 0$ Table 3.2: FO4 Gate Normalized MSA. | Cases | | | $O_1/I_1$ | $O_2/I_1$ | $O_3/I_1$ | $O_4/I_1$ | |-------------|-------|-------|-----------|-----------|-----------|-----------| | $I_3 = I_4$ | $I_2$ | $I_1$ | | | | | | 0 | 0 | 0 | 0.9 | 0.9 | 1 | 1 | | 0 | 0 | 1 | 0.25 | 0.25 | 0.45 | 0.43 | | 0 | 1 | 0 | 0.32 | 0.32 | 0.26 | 0.27 | | 0 | 1 | 1 | 0.38 | 0.39 | 0.33 | 0.33 | | 1 | 0 | 0 | 0.38 | 0.39 | 0.33 | 0.33 | | 1 | 0 | 1 | 0.32 | 0.32 | 0.26 | 0.27 | | 1 | 1 | 0 | 0.25 | 0.25 | 0.45 | 0.43 | | 1 | 1 | 1 | 0.9 | 0.9 | 1 | 1 | Figure 3.7: 4-output Balanced (a) AND/OR Gate (b) AND/AND Gate OOMMF Simulation. and $\{I_1,I_2\}=\{1,1\}$ in $O_1=O_3=1$ , as expected. The OR gate functionality is obtained at $O_2$ and $O_4$ . Likewise, Figure 3.6(b) can be analysed. Moreover, 2-input (N)OR/(N)OR gates can be obtained in the same manner if $I_3=I_4=1$ . Therefore, the structure can provide AND, NAND, OR, and NOR gate functionalities while each gate column being able to provide AND (OR) in its direct and inverted format or in the same format with a fanout of 2. Table 3.2 presents normalized MSA values at $O_1$ , $O_2$ , $O_3$ , and $O_4$ for $I_3=I_4=0$ and $I_3=I_4=1$ and all possible inputs combination $\{I_1,I_2\}=\{0,0\},\{0,1\},\{1,0\},\{1,1\}$ for the structure in Figure 3.6. Note that the results for the cases $\{I_3,I_4\}=\{0,1\},\{1,0\}$ are exhibiting the same behaviour which is able to provide XOR gate. The MSA values in the Table indicate that $O_1$ and $O_2$ can provide X(N)OR functionality if an appropriate threshold value, i.e., 0.35, which is the $O_1$ and $O_2$ normalized MSA average value for input combinations $\{1,0\}$ and $\{1,1\}$ . To implement the XOR gate, the condition must be: if the normalized magnetization is larger than 0.35, then outputs equal to logic 0 and logic 1 otherwise. The XNOR gate can be captured by flipping the condition. However, as the four outputs do not have the same magnetization, $O_3$ and $O_4$ cannot provide X(N)OR Figure 3.8: Fan-out of 4 MAJ3 Gate OOMMF Simulation. Table 3.3: Balanced FO4 Gate Normalized MSA. | Cases | | | $O_1/I_1$ | $O_2/I_1$ | $O_3/I_1$ | $O_4/I_1$ | |-------------|-------|-------|-----------|-----------|-----------|-----------| | $C_1 = C_2$ | $I_2$ | $I_1$ | | | | | | 0 | 0 | 0 | 1 | 1 | 1 | 1 | | 0 | 0 | 1 | 0.33 | 0.33 | 0.33 | 0.33 | | 0 | 1 | 0 | 0.3 | 0.3 | 0.3 | 0.3 | | 0 | 1 | 1 | 0.43 | 0.43 | 0.43 | 0.43 | | 1 | 0 | 0 | 0.43 | 0.43 | 0.43 | 0.43 | | 1 | 0 | 1 | 0.3 | 0.3 | 0.3 | 0.3 | | 1 | 1 | 0 | 0.33 | 0.33 | 0.33 | 0.33 | | 1 | 1 | 1 | 1 | 1 | 1 | 1 | functionality. Thus, to balance the output energies and to enable XOR and XNOR in all four outputs, we place the control inputs as depicted in Figure 3.3. Figure 3.7a) and (b) present OOMMF simulation results for the 2-input 4-output balanced AND/OR and AND/AND gates, respectively, for all possible $I_1$ , $I_2$ combinations. By inspecting Figure 3.7(a) one can observe that the left arm provides the AND functionality at $O_1$ and $O_3$ , and the right arm OR functionality at $O_2$ and $O_4$ . The same line of thinking as in the previous 2-input cases can be followed to analyse the result in Figure 3.7(a) and (b). Thus, also this structure can provide AND, NAND, OR, and NOR gate functionalities and each gate column is able to provide AND (OR) in its direct and inverted format or in the same format with a fanout of 2. However, as indicated in Table 3.3 the new gate layout balances the normalized MSA of the gate outputs $O_1$ , $O_2$ , $O_3$ , and $O_4$ . Consequently, XOR and XNOR can be now implemented at all four outputs by making use of the same threshold value of 0.38 obtained by averaging the normalized $O_1$ , $O_2$ , $O_3$ , and $O_4$ MSAs for the input combinations 0, 1 and 1, 1. Therefore, the structure can provide different combinations of XOR and XNOR, and enable a fanout value up to 4. Also, as an additional example, we used Figure 3.9: Fan-out of 2 MAJ3 Triangle Gate. the proposed PLG to implement a 3-input Majority gate with fanout of 4. The simulation results for this implementation are presented in Figure 3.7. By inspecting the Figure, the outputs $O_1$ , $O_2$ , $O_3$ , and $O_4$ are the same for all input cases. The same line of thinking as the previous cases can be followed to analyse the results. If inputs $\{I_1, I_2, I_3\}$ are $\{0,0,0\}$ , $\{0,0,1\}$ , $\{0,1,0\}$ , and $\{1,0,0\}$ the outputs are $O_1=0$ , $O_2=0$ , $O_3=0$ , and $O_4=0$ . Also, $O_1=1$ , $O_2=1$ , $O_3=1$ , and $O_4=1$ for the input combinations $\{0,1,1\}$ , $\{1,0,1\}$ , $\{1,1,0\}$ , and $\{1,1,1\}$ . Thus, the Majority behaviour is delivered, and as according to Table 3.3, all outputs exhibit the same energy level, a fanout of 4 is achieved. # 3.2. Triangle Shape Structure In this section, we introduce the triangle shape Majority and XOR gates structures. Furthermore, we discuss the simulation results. #### **3.2.1.** FANOUT ENABLED SPIN WAVE MAJORITY AND XOR GATES Furthermore, we developed a novel triangle shape fanout of 2 (FO2) MAJ3 structure, illustrated in Figure 3.9, with 3 inputs $I_1$ , $I_2$ , and $I_3$ and 2 outputs $O_1$ and $O_2$ [347]. In contrast to the previous ladder shape structure, this one does not need the replication of one of its inputs; thus, it is more energy effective. To obtain the desired pattern at the interference point, the width of waveguide must be equal or less than the wavelength $\lambda$ , and all SWs must be excited with the same amplitude and frequency. The proposed structure is generic and its dimensions are indicated in Figure 3.9. As previously discussed, the structure dimension, and the outputs $O_1$ and $O_2$ positions must be chosen accurately to provide the desired functionality. The proposed gate operates as follows: (i) At $I_1$ , $I_2$ , and $I_3$ , SWs are excited with the suitable phase (0 for logic 0 and $\pi$ for logic 1). (ii) The excited SWs at $I_1$ and $I_2$ propagate diagonally until reaching the crossing points where they interfere with each other constructively or destructively depending on their phases. (iii) The resulted SWs propagate to interfere constructively or destructively at both interfering points with the SW excited at $I_3$ . (iv) The output SWs are captured at $O_1$ and $O_2$ by phase detection, i.e., phase 0 results in logic 0, while phase $\pi$ in logic 1. Because of the symmetry and the SWs' isotropic propagation through this structure the two Figure 3.10: Fan-out of 2 MAJ3 Gate MuMax3 Simulation. SWs reaching $O_1$ and $O_2$ are identical, which means that a fanout of 2 has been achieved. #### **3.2.2.** SIMULATION SETUP AND RESULTS We validated the structure by means of MuMax3 [195] simulations using a 50 nm wide and 1 nm thick $Fe_{60}CO_{20}B_{20}$ waveguide. A 70 nm wavelength has been chosen to be greater than the waveguide width to simplify the interference pattern. Therefore, $d_1=d_2=210$ nm, $d_3=490$ nm, and $d_4=105$ nm. Moreover, according to the FVMSW dispersion relation and for $k=2\pi/\lambda=50$ rad/ $\mu$ m, a SW frequency of 10 GHz was determined. In addition, the following parameters are utilized: magnetic saturation $M_s=1.1$ MA/m, exchange stiffness $A_{ex}=18.5$ pJ/m, and damping constant $\alpha=0.004$ [346]. Figure 3.10 presents MuMax3 simulation results for the 3-input 2-ouput Majority gate, where blue represents logic 0 and red logic 1, which clearly indicates the correct functionality of the gate. $O_1$ and $O_2$ provide logic 0 as reaction to inputs patterns $\{I_1,I_2,I_3\}=\{0,0,0\},\{0,0,1\},\{0,1,0\},\{1,0,0\},$ and logic 1 for $\{0,1,1\},\{1,0,1\},\{1,1,0\},$ and $\{1,1,1\}$ . To demonstrate the equivalence of the two outputs, i.e., FO2 achievement, we extracted the output SWs energy from MuMax3 simulations for all possible input patterns. The normalized MSA values at $O_1$ and $O_2$ are presented in Table 3.4 and as one can observe in the Table, they are the same for all cases, which implies that a fanout of 2 has been successfully achieved. Table 3.4: 2-output MAJ3 Normalized MSA. | ( | Cases | 3 | $O_1$ | $O_2$ | |-------|-------|-------|-------|-------| | $I_3$ | $I_2$ | $I_1$ | | | | 0 | 0 | 0 | 1 | 1 | | 0 | 0 | 1 | 0.083 | 0.084 | | 0 | 1 | 0 | 0.16 | 0.16 | | 0 | 1 | 1 | 0.164 | 0.164 | | 1 | 0 | 0 | 0.164 | 0.164 | | 1 | 0 | 1 | 0.16 | 0.16 | | 1 | 1 | 0 | 0.083 | 0.084 | | 1 | 1 | 1 | 1 | 1 | Figure 3.11: Fan-out of 2 XOR Triangle Gate. Interesting to note that the triangle structure is versatile and can become an XOR gate by removing the third input as depicted in Figure 3.11. While the same operation principle and the design steps are still in place, threshold-based detection must be utilized to obtain the XOR functionality. We validated the structure by means of MuMax3 simulations using the same aforementioned parameters. Table 3.5 presents the triangle shaped XOR gate normalized MSA values. The suitable threshold in this case is 0.5 because for $\{I_1,I_2\}$ being $\{0,0\}$ and $\{1,1\}$ MSA are approximately 1 and approximately 0 when the inputs are 0,1 and 1,0. # **3.3.** Performance Evaluation To get some inside on the potential practical implications of our proposals, we evaluate in terms of delay and energy consumption the ladder and triangle shape fanout enabled gates and compare them with state-of-the-art SW [348] and 16 nm CMOS counterparts. For comparison fairness, we maintain the assumptions in [348] as follows: (i) ME cells are used to excite and detect SWs. (ii) The ME cell energy consumption and delay are 14.4 aJ and 0.42 ns, respectively. (iii) SWs consume tiny energy while propagating in the waveguide in comparison with the transducers. (iv) Pulse signals are used to excite SW. (v) SW gate outputs are directly driving the SW gates following them; thus, no delay and Table 3.5: 2-output XOR Gate Normalized Output Magnetization. | Cases | | $O_1$ | $O_2$ | |-------|-------|-------|-------| | $I_2$ | $I_1$ | | | | 0 | 0 | 0.99 | 1 | | 0 | 1 | ≈ 0 | ≈ 0 | | 1 | 0 | ≈ 0 | ≈ 0 | | 1 | 1 | 1 | 1 | Table 3.6: Performance Comparison. | Designs | CMOS [349] | SW [348] | SW Ladder [78] | | SW Triangle | |-----------------|----------------|------------|----------------|------------|-------------| | Technology | 16 nm CMOS | SW | SW | | SW | | Number of cells | 16 transistors | 4 ME cells | 6 ME cells | 8 ME cells | 5 ME cells | | Fanout | > 2 | 1 | > 2 | > 2 | >2 | | Delay (ns) | 0.031 | 0.42 | 0.42 | 0.42 | 0.42 | | Energy (aJ) | 466 | 43.3 | 57.6 | 57.6 | 43.3 | energy overhead is accounted for the ME cells at the gate output. We note that due to the SW technology early stage of development, these assumptions might be optimistic and they might require a re-evaluation in the close future. Furthermore, we assumed that the 3-input CMOS Majority gate is built with 4 NAND gates and its energy consumption and delay were estimated with respect to the provided numbers in [349]. Table 3.6 presents the evaluation results. As it can be observed in the Table, while 14xslower than the CMOS counterpart, the proposed ladder and triangle structures provide 9x and 10.5x energy consumption reduction, respectively. We also note that while the design in [348] is slightly better in terms of delay, it does not provide fanout capabilities. Therefore, if more outputs are needed, the circuit must be replicated, which results in energy and area overheads. For instance, if 2 outputs are needed for the design in [348], the structure must be replicated twice, which raise the energy consumption to 86.6 aJ. As the proposed 2-output ladder and triangle structures consume 57.6 aJ and 42.3 aJ only they provide a 33% and 50% energy reduction, respectively, without inducing any area or delay overhead. The advantage becomes even more substantial if larger fanout is required. For example, if a gate fanout of 4 is needed, the structure must be replicated 4 times leading to an energy consumption of 173 aJ while the ladder shape 4-output structure consumes 57.6 aJ, therefore it enables an energy reduction by a factor of 3 without any area and delay overheads. When comparing with the ladder and the triangle shape structures, one can observe that the later is more energy effective and enables a 25% energy reduction while exhibiting the same delay. However, if a fanout of 4 is targeted the triangle structure must be replicated twice, resulting in 50% more energy consumption than the 4-output ladder shape gate. As a closing remark, we note that achieving larger than 1 fanout is an enabling factor for the realization of SW circuits, as it eliminates the otherwise required circuit replication associated with fanout nodes intrinsic to SW circuits produced by means of logic synthesis. Thus, the implications of our proposals at the circuit level are a lot more substantial than at the gate level, both in terms of area 3.4. CONCLUSIONS 61 and energy consumption. Both ladder and triangle shape structures exhibit fanout of 2, while the ladder shape structure can achieve a fanout of 4 at the expanse of an extra ME cell. We also note that for proper gate operation inputs may have to be excited at different energy levels depending on weather they have straight path to outputs or face bent regions at the edges. Furthermore, for the ladder shape structures all inputs affect all outputs, e.g., for the 2-output gate $I_3$ has an effect on $O_2$ and $I_4$ on $O_1$ , which might create problems when different gate behaviours should be delivered at outputs. Therefore, to guarantee proper gate operation, design precautions are required to ensure that $I_3$ , $I_1$ , and $I_2$ contribute more on $O_1$ when compared with $I_4$ , and that the contribution of $I_4$ , $I_1$ , and $I_2$ on output $O_2$ dominates $I_3$ 's contribution. In contrast, the triangle shape structure does not need an extra ME cell to achieve the fanout capability, which saves energy and allows for equal energy inputs excitation but limits the achievable fanout to 2. ### 3.4. CONCLUSIONS We first introduced novel ladder and triangle shape spin wave majority gate device concepts that can achieve a fan-out of up to 4 and 2, respectively, and discusses how the ladder Majority can serve as a programmable logic gate and the triangle one as an XOR gate. The proposed designs were validated by means of OOMMF and MuMax3 micromagnetic simulations and compared with the state-of-the-art spin wave and 16 nm CMOS, counterparts. Our evaluation indicated that, while 14x slower than the CMOS counterpart, the proposed ladder and triangle structures provide 9x and 10.5x energy consumption reduction, respectively. Moreover, due to their fanout capabilities, they also provide a 33% and 50% energy reduction, respectively, when compared with state-of-the-art SW gates, without inducing any area or delay overhead. 3 # SPIN WAVE DATA PARALLELISM - 1.1. N-BIT DATA PARALLEL SW LOGIC GATE - 1.2. SIMULATION SETUP AND RESULTS - 1.3. PERFORMANCE EVALUATION - 1.4. CONCLUSIONS As mentioned previously, different logic gates built on spin wave technology were presented, e.g., [58], [59], [62], [77], [80]–[96]. All these designs operate on same frequency SWs, i.e., on 1-bit inputs, therefore, if multiple-bit input functions are to be evaluated, e.g., bitwise XOR over two n-bit inputs $A = (a_1, a_2, ..., a_n)$ and $B = (b_1, b_2, ..., b_n)$ , an XOR gate structure must be replicated n times in order to process the n input bit-pairs (sets) in parallel at the expense of area overhead. However, different frequency SWs can simultaneously propagate through the same waveguide without affecting each other, while only interfering with their own species. This suggests that if each input pair $(a_i, b_i)$ is encoded with $f_i$ frequency SWs, XOR(A, B) can be potentially evaluated with one instead of n XOR gates. This approach has been pursued in [80], which introduces a Majority gate structure able to simultaneously process 3 data set encoded at 3 different frequencies. However, the suggested structure contains a magnonic crystal that induces a large delay overhead. In this chapter, we revisit the SW parallelism concept, and propose a novel multi-frequency data parallel in-line generic SW gate structure. This chapter content is based on the following publications: **A. N. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui and S. Cotofana, *Multifrequency Data Parallel Spin Wave Logic Gates*, in IEEE Transactions on Magnetics, **57**, no. 5, pp. 1-12, May 2021, Art no. 3401012. **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana and S. Hamdioui, *n-bit Data Parallel Spin Wave Logic Gate*, 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE), 2020, pp. 642-645. Figure 4.1: Conventional SW Logic Gate Structure # **4.1.** *n*-BIT DATA PARALLEL SW LOGIC GATE Figure 4.1 depicts the generic structure of a SW based logic gate, which consists of multiple inputs $(I_1, I_2, I_3, ..., I_n)$ , a Functional Region (FR), which might perform Majority, AND, OR, XOR function or its inverted version, and an output O. All inputs are excited at the same frequency, propagate from their sources through the waveguide and interfere constructively or destructively based on their phases. The result is available at the output as a SW with the same frequency as the inputs. This is a scalar gate as each input SW represents one bit, thus in case the same function has to be pairwise evaluated on n-bit inputs this can be done in parallel by instantiating n such gates or serially by using one gate only with the associated area and delay overhead, respectively. In the following, we take advantage of different frequency SW interaction behaviour and introduce data parallel SW gates that can process n-bit inputs without hardware replication or serialisation. Figure 4.2 presents the parallel spin wave logic gate, which is able to concurrently process m n-bit inputs. As indicated in the Figure, the input sets $\mathcal{I}_i = \{I_{i,1}, I_{i,2}, I_{i,3}, \ldots, I_{i,m}\}, i=1,2,\ldots,n$ , are simultaneously encoded into SWs with frequency $f_i$ by means of, e.g., Magnetoelectric (ME) cells or antennas. Subsequently, the SWs corresponding the sets $\mathcal{I}_i$ , $i=1,2,\ldots,n$ propagate through the waveguide without affecting each other until reaching the Functional Region (FR). Once the $m \times n$ spin waves arrive at FR, equal-frequency spin waves interfere constructively and destructively depending on their phases, producing n output SWs $\mathcal{O}_i = \mathcal{F}(\mathcal{I}_i)$ , $i=1,2,\ldots,n$ , where $\mathcal{F}$ is the gate function, e.g., AND, OR, XOR. Those SWs can be sensed and transformed into the voltage domain by the detection cells located at $O_1$ , $O_2$ , ..., $O_n$ or transmitted to the next SW gate. Although the approach in Figure 4.2 is generic its practical realization requires stacked Figure 4.2: Multi-Frequency Spin Wave Logic Gate | | dn | | - | | dn×n | n+n | - | |-------------------------|--------|-------------------------------------|--------|-------------------------|--------|-------------------------------|--------| | <sub>,</sub> <b>d</b> 2 | | - : | | dn×n | n+2 | | | | dı | | | | dn×m+1 | | | | | $\bar{F}_1$ $\bar{F}_2$ | [Fn] | $\overline{F_1}$ $\overline{F_2}$ | Fni | $\vec{F}_1$ $\vec{F}_2$ | [Fn] | $\vec{F}_{11}$ $\vec{F}_{21}$ | Fn | | l1,1 l2,1 · | · In,1 | l <sub>1,2</sub> l <sub>2,2</sub> · | · In,2 | 1,m 12,m | · In,m | O <sub>1</sub> O <sub>2</sub> | On | | ı 1stı ı 1stı | ı 1stı | ı2ndı ı2ndı | ı 2ndı | ımthı ımthı | ımthı | 11st 12ndi | ı nthı | | F1 F2 | I Fn I | F1 F2 | IFn I | F1 F2 | IFn I | bit bit | bit | | bit bit | bit ! | bit bit bit | bit | bit bit | [bit] | | | Figure 4.3: *n*-bit Inputs In-line Spin Wave Logic Gate waveguides and contains bent regions, which impede smooth SW propagation. We address these issues by applying the same idea on a single waveguide structure and constructing the in-line gate in Figure 4.3. #### Algorithm 1 Data Parallel Gate Area Optimization ``` Inputs: WE, L, D, w, d[i], i=1:n, \lambda[i], i=1:n Outputs: TP[i,j], i=1:n; j =1:m+1, A > WE is the waveguide end, L the transducer length, D the minimum distance between consecutive transducers, w the waveguide width, d the distance between two consecutive inputs of the same frequency, TP is the transducer position, A is the gate area. TP[1:n,1:m+1] = 0 for i = 1 to m + 1 do for i = 1 to n do TP[i,j] = WE WE = WE + L + D end for if j > 1 then for i = 1 to n do d[i] = TP[i,j] - TP[i,j-1] if \left\lceil \frac{d[i]}{\lambda[i]} \right\rceil \times \lambda_i = d[i] then TP[i,j] = TP[i,j] TP[i,j] \leftarrow \left\lceil \frac{d[i]}{\lambda[i]} \right\rceil \times \lambda[i] if i = 1 then \mathrm{TP}[i\text{-}1,j] = \mathrm{TP}[n,j\text{-}1] if TP[i,j] - TP[i-1,j] > D + L then TP[i,j] = TP[i,j] end for for i = 1 to n do if i = 1 then \mathrm{TP}[i\text{-}1,j] = \mathrm{TP}[n,j\text{-}1] end if if TP[i,j] - TP[i-1,j] > D + L then for c = 1 to n do if \left\lceil \frac{TP[i,j] + D + L}{\lambda[c]} \right\rceil \times \lambda[c] = TP[i,j] + D + L TP[c,j] = TP[i,j] + D + L TP \leftarrow Sort(TP) end if end for end if end for end if end for WF = TP(n m+1) + I A = WE \times w ``` As mentioned in the previous chapter, for proper gate operation, SWs with the same frequency must be excited with the same amplitude and wavelength. Moreover, the distances between input sources and interference locations are SW frequency specific and crucial for proper gate functionality, thus they must be accurately determined. For example, if constructive interference is required for in-phase SWs and destructive for out-of-phase SWs, the distances between the same frequency sources must be $j_q \times \lambda_i, i = 1$ Table 4.1: Parameters | Parameters | Values | |---------------------------------------------|------------------------------------| | Magnetic saturation $M_s$ | $1.1 \times 10^6 \text{A/m}$ | | Perpendicular anisotropy constant $k_{ani}$ | $8.3177 \times 10^5 \text{ J/m}^3$ | | Damping constant $\alpha$ | 0.004 | | Waveguide thickness t | 1 nm | | Exchange stiffness $A_{exch}$ | 18.5 pJ/m | Figure 4.4: Unoptimized 8-bit XOR Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . $(1,2,3,\ldots,n)$ , i.e, $d_1=j_1\lambda_1,\ d_2=j_2\lambda_2,\ \ldots,\ d_{nm}=j_{nm}\lambda_n$ , where $j_q=\{1,2,3,\ldots\},\ q=1,2,3,\ldots,nm$ . Note that to minimize gate area and delay, $j_q=1$ is the preferred choice, which is feasible for scalar gates but not always possible for parallel gates. Whereas, the distances must be $(j_q+\frac{1}{2})\lambda_i$ , i.e., $d_1=(j_1+\frac{1}{2})\lambda_1,\ d_2=(j_2+\frac{1}{2})\lambda_2,\ldots,\ d_{nm}=(j_{nm}+\frac{1}{2})\lambda_n$ , if the opposite behaviour is desired. In view of the previous discussion, each output wave $\mathcal{O}_i$ is available for detection after a delay determined by the distance between the most faraway input cell of the $\mathcal{I}_i$ set, i.e., $I_{i,1}$ in Figure 4.3, and the output cell $O_i$ , thus full parallelism is achieved. Note that the actual gate delay value can be optimized by choosing appropriate, e.g., waveguide material, dimensions, and thickness. While delay optimization is a matter of waveguide material and geometry choice, the gate area can be minimized by changing the position of the input and output transducers. One can observe in Figure 4.3 that input and output cells are ordered by bit position for clarity purpose. However, they can be shuffled as long as the previously discussed constraints are still satisfied, and this results in an area (overall gate length) reduction. To this end, we introduce Algorithm 1, which identifies the transducer (source/detector) locations that are minimizing the waveguide length, while not infringing the wavelength dependent inter transducers distance constraints. The algorithm iteratively construct the gate structure by instantiating one input set $\mathscr{I}_i$ , $i=1,2,\ldots,n$ at a time, while optimizing its transducer positions in relation to the already optimized structure embedding the Figure 4.5: Unoptimized 8-bit XOR Gate Outputs a) $f_1$ =10 GHz, b) $f_2$ =20 GHz, ..., h) $f_8$ =80 GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ previously instantiated sets $\mathcal{I}_i$ , j = 1, 2, ..., i - 1. The algorithm starts with a configuration in which all transducers are placed overlapped at the waveguide beginning. Subsequently, inputs sets are processed one at a time by initially placing them one after the other at D distance regardless of the wavelength of the SW they process (line 3 to 7). If the first set was the one currently processed no further adjustments are required and the second set can be considered for placement. If this is not the case, the for loop (line 9 to 24) is re-positioning the transducer at the correct positions, which are multiples of their wavelength frequency. After this step, the transducer configuration for the up to date processed sets is the same as in Figure 4.3. Next, the for loop (line 25 to 38) performs the area optimization by checking the spaces between transducers and if it is possible moving one transducer if its wavelength imposed distance condition is satisfied. If one transducer has been moved **Sort** reorders the transducers in the **TP** matrix to capture the new configuration. These steps are repeated until all sets are placed and the gate length optimized. At the end, the gate area is calculated by multiplying the waveguide width by the waveguide length. Let us assume a 3-bit 2-input gate operating on SWs with wavelength $\lambda_1$ =100 nm, $\lambda_2$ =50 nm, and $\lambda_3$ =19 nm, 10 nm transducer length, and 1 nm minimum distance between transducers. By following the structure in Figure 4.3, the second input set can begin at 33 nm from the waveguide start because the first three sources $I_{1,1}$ , $I_{1,2}$ , $I_{1,3}$ occupy Figure 4.6: Optimized 8-bit XOR Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . each 10 nm and are 1 nm distanced apart. As such the initial order is $(I_{1,1}, I_{1,2}, I_{1,3}, I_{2,1}, I_{2,2}, I_{2,3}, O_1, O_2, O_3)$ with a corresponding waveguide length of 288 nm. The optimization algorithm changes the order to $(I_{1,1}, I_{1,2}, I_{1,3}, I_{2,3}, I_{2,2}, I_{2,1}, O_3, O_2, O_1)$ , which corresponds to a 210 nm waveguide length thus about 27% area savings. Furthermore, two main methods can be utilized for output detection: (i) Phase detection, and (ii) Threshold detection as mentioned in the previous chapter. In the first case, a predefined phase is utilized as reference and a phase difference of 0 represents a logic 0, and a phase difference of $\pi$ a logic 1. The second detection method assesses the SW magnetization (SWM) value and reports a 0 logic if the SWM is smaller than a predefined threshold value and a logic 1 otherwise. If phase detection is in place, the gate can provide non-inverted or inverted output (or even both of them) by adjusting the reading location. For instance, referring to Figure 4.3, the detectors must be placed at a distance equal to (from the last $f_i$ SW source) $(j_q + \frac{1}{2})\lambda_i, i = (1, 2, 3, ..., n)$ , such that $d_{nm+1} = (j_{nm+1} + \frac{1}{2})\lambda_1, d_{nm+2} = (j_{nm+2} + \frac{1}{2})\lambda_2, \dots, d_{nm+n} = (j_{nm+n} + \frac{1}{2})\lambda_n$ , if the noninverted results are desired. However, the detectors must be placed at a distance equal to (from the last $f_i$ SW sources) $j\lambda_i$ such that $d_{nm+1} = j_{nm+1}\lambda_1$ , $d_{nm+2} = j_{nm+2}\lambda_2$ , ..., $d_{nm+n} = j_{nm+n}\lambda_n$ if the compliment is required. In the case of threshold based detection, the gate can provide non-inverted or inverted outputs without changing the output detector position by just switching the thresholding condition in the detector cell. Note that, regardless of the detection method, each read location should be as close as possible to the last input in its set to diminish the due to damping SW energy lost and process high amplitude spin waves. # 4.2. SIMULATION SETUP AND RESULTS $Fe_{60}Co_{20}B_{20}$ waveguides that have waveguide width of 50 nm with Perpendicular Magnetic Anisotropy (PMA) are utilized for all gate constructions. We note that for this material the anisotropy field $H_{anisotropy} > M_s$ , which means that there is no need for the application of an external magnetic field [346]. Table 4.1 presents the parameter we uti- Figure 4.7: Optimized 8-bit XOR Gate Outputs: a) $f_1$ =10 GHz, b) $f_2$ =20 GHz, ..., h) $f_8$ =80 GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . lize to validate the 8-bit 2-input XOR/XNOR and 3-input Majority gates. The 8 SW frequencies are 10 GHz, 20 GHz, 30 GHz, 40 GHz, 50 GHz, 60 GHz, 70 GHz, and 80 GHz. By making use of the FVSW dispersion relation and given that the wavenumber $k=\frac{2\pi}{\lambda}$ , we determine the distances between transducers exciting/detecting SWs with the same frequency are: $d_1$ =166 nm (j=2), $d_2$ =100 nm (j=2), $d_3$ =117 nm (j=3), $d_4$ =165 nm (j=5), $d_5$ =174 nm (j=6), $d_6$ =130 nm (j=5), $d_7$ =168 nm (j=7), and $d_8$ =176 nm (j=8), $d_9$ =166 nm (j=2), $d_{10}$ =100 nm (j=2), $d_{11}$ =117 nm (j=3), $d_{12}$ =132 nm (j=4), $d_{13}$ =145 nm (j=5), $d_{14}$ =104 nm (j=4), $d_{15}$ =144 nm (j=6), and $d_{16}$ =44 nm (j=2), $d_{17}$ =166 nm (j=2), $d_{18}$ =150 nm (j=3), $d_{19}$ =156 nm (j=4), $d_{20}$ =66 nm (j=2), $d_{21}$ =87 nm (j=3), $d_{22}$ =78 nm (j=3), $d_{23}$ =72 nm (j=3), and $d_{24}$ =110 nm (j=5). Note that $d_1$ to $d_{16}$ are the distances between transducers exciting/detecting SWs with the same frequency for XOR gate, and $d_1$ to $d_{24}$ are the distances between transducers exciting/detecting SWs with the same frequency for Majority gate. Furthermore, an 1 nm minimum separation distance between transducers is in place. Note that logic 0 represents SW with phase 0, and logic 1 represents SW with phase $\pi$ . We perform the following simulation experiments: • 8-bit 2-input XOR/XNOR gate with threshold detection. The two 8-bit inputs are simultaneously excited using the sources $(I_{1,1}, I_{2,1}, I_{3,1}, ..., I_{8,2})$ . The excited spin waves propagate through the waveguide and those who have the same frequencies interfere with each other. The resulting spin waves propagate towards the output where they are captured at $O_1, O_2, ..., O_8$ based on threshold detection. We carry Figure 4.8: Unoptimized 8-bit Majority Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . on the validation of both area un-optimized $(I_{1,1},I_{2,1},I_{3,1},I_{4,1},I_{5,1},I_{6,1},I_{7,1},I_{8,1},I_{1,2},I_{2,2},I_{3,2},I_{4,2},I_{5,2},I_{6,2},I_{7,2},I_{8,2},I_{1,3},I_{2,3},I_{3,3},I_{4,3},I_{5,3},I_{6,3},I_{7,3},I_{8,3})$ and optimized $(I_{1,1},I_{2,1},I_{3,1},I_{4,1},I_{5,1},I_{6,1},I_{7,1},I_{8,1},I_{2,2},I_{3,2},I_{1,2},I_{6,2},I_{4,2},I_{5,2},I_{7,2},I_{8,2},I_{2,3},I_{8,3},I_{3,3},I_{1,3},I_{6,3},I_{4,3},I_{5,3},I_{7,3})$ configurations. Note that as detectors order is not important they follow the same pattern, i.e., $(O_1,O_2,O_3,O_4,O_5,O_6,O_7,O_8)$ in both cases. • 8-bit 3-input Majority gate based on phase detection. We again considered area un-optimized and optimized gate instances but in this case detector order is relevant, thus the after optimization source and detector order is $I_{1,1}, I_{2,1}, I_{3,1}, I_{4,1}, I_{5,1}, I_{6,1}, I_{7,1}, I_{8,1}, I_{2,2}, I_{3,2}, I_{1,2}, I_{6,2}, I_{4,2}, I_{5,2}, I_{7,2}, I_{8,2}, I_{2,3}, I_{8,3}, I_{3,3}, I_{1,3}, I_{6,3}, I_{4,3}, I_{5,3}, I_{7,3}, O_6, O_8, O_4, O_2, O_5, O_1, O_7, O_3.$ Figure 4.4 presents OOMMF simulation results for the area un-optimized byte-based 2-input XOR gate instance. The y-axis reflects the output SWs $M_x$ over $M_s$ ratio, i.e., magnetization in the x-direction over magnetic saturation. To simplify the Figure, we only assume all 0s and all 1s input sets, thus only four input combinations are possible, and as such the gate response to any input combination is the same in all frequencies. As expected same-frequency SW pairs interfere without affecting the other SWs, and this is clear from Figure 4.4, which indicates that 8 different frequencies components exist without distorting each-other in the Fast Fourier Transform (FFT) amplitude spectrum for all the considered input combinations. Moreover, as it can be noticed from Figure 4.5, the output SWs are not distorted and can be properly detected for each frequency. Let us consider the first output detection cell, which is tuned for the 10 GHz SW. When reading the output at time 0.5 ns for $\mathcal{I}_1 = \mathcal{I}_2 = 0$ and $\mathcal{I}_1 = \mathcal{I}_2 = 1$ , the absolute SW magnetization value is greater than $0.0035 M_s$ due to the constructive interference, whereas the SW magnetization is less than 0.0035 $M_s$ when one input set is 0 and the other one is 1. Therefore, if the detection threshold is set to 0.0035 $M_s$ an XOR function is obtained as a SW magnetization greater (lower) than 0.0035 $M_s$ is read as a logic 0 (1). An XNOR can be realized by flipping the condition such that a SW magnetization lower (greater) than $0.0035~M_{\rm S}$ is read as a logic 0 (1). Similarly, for the second detection cell, which targets Figure 4.9: Unoptimized 8-bit Majority Gate Outputs a) $f_1$ =10 GHz, b) $f_2$ =20 GHz, ..., h) $f_8$ =80 GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . the 20 GHz SW a threshold value of 0.0032 $M_s$ is in place and by following a similar way of reasoning threshold values of 0.0028 $M_s$ , 0.0025 $M_s$ , 0.0022 $M_s$ , 0.0017 $M_s$ , 0.0015 $M_s$ , and 0.001 $M_s$ can be determined for the rest of frequencies. Figure 4.6 and 4.7 present OOMMF simulation results for the optimized 8-bit 2-input XOR gate. As depicted in Figure 4.7, the simulation proves the correct functionality of the XOR/XNOR gate. One can observe in the Figure that in this case the SW magnetization at all frequencies is higher as the spin waves propagate on lower distances when compared with the non-optimized case. In addition, the detection threshold values are higher, i.e., $0.007\ M_s$ , $0.005\ M_s$ , $0.0045\ M_s$ , $0.0038\ M_s$ , $0.0034\ M_s$ , 0.0027, $0.0025\ M_s$ , and $0.002\ M_s$ ; therefore, less sensitive detectors are requited for the XOR/XNOR gate implementation. The 8-bit 3-input unoptimized Majority gate OOMMF simulation results are presented in Figure 4.8. The same notations are in place and again, to simplify the Figure we only assume all 0s and all 1s input sets, thus only 8 input combinations are presented. The Figure clearly demonstrates proper gate functionality as 8 different frequencies components exist without distorting each-other in the Fast Fourier Transform (FFT) amplitude spectrum for all the possible input combinations ( $\mathcal{I}_1 = \mathcal{I}_2 = \mathcal{I}_3 = 0$ ), ( $\mathcal{I}_1 = \mathcal{I}_2 = 0$ , $\mathcal{I}_3 = 1$ ), ..., ( $\mathcal{I}_1 = \mathcal{I}_2 = \mathcal{I}_3 = 1$ ). Figure 4.9 indicates that the output SWs are not distorted and can be properly detected for each frequency. Let us concentrate on Figure 4.9a, which captures the 10 GHz 3-input Majority gate response and consider the output at time moment 0.75 ns, When the three inputs have the same phase of 0 Figure 4.10: Optimized 8-bit Majority Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . $(I_1I_2I_3=000)$ they constructively interfere in the waveguide resulting in a phase of 0 SW, which corresponds to a logic 0. Also, when at most one of the inputs is logic 1 $(I_1I_2I_3=001,\ I_1I_2I_3=010,\ I_1I_2I_3=100)$ , i.e., has phase of $\pi$ , the SWs interfere constructively and destructively, and the results are still a logic 0. In contrast, if at most one of the inputs is logic 0 $(I_1I_2I_3=011,\ I_1I_2I_3=110,\ I_1I_2I_3=101)$ , then the output is logic 1 as a result of the interferences. Further, when the three inputs have the same phase of $\pi$ $(I_1I_2I_3=111)$ , then spin waves interfere constructively in the waveguide, which results in a phase of $\pi$ , which corresponds to a logic 1. The same line of reasoning can be applied for all the other 7 cases as it is clearly indicated by Figure 4.9. The optimized 8-input 3-input Majority gate OOMMF simulation results are presented in Figure 4.10 and 4.11. As it can be observed from Figure 4.11, the gate functions correctly while the SW amplitudes are higher as due to the optimization SWs propagate over shorter distances, which enables the utilization of less sensitive detectors. ### 4.3. PERFORMANCE EVALUATION To get inside on the practical potential of our proposal, we evaluate and compare the 8-bit gates with functionally equivalent state-of-the-art SW implementation obtained by the instantiation of 8 normal (scalar) Majority/XOR gates, in terms of area, delay, and power consumption. In our evaluations we make the following assumptions: (i) source/detector dimensions are $10 \text{ nm} \times 50 \text{ nm}$ as suggested in [79], (ii) SW propagation through the waveguide does not consume noticeable energy, and (iii) transducer delay is 0.42 ns [348]. Under these assumptions, we first evaluate the optimization algorithm impact on the 8-bit gates area. Our calculations indicate that the un-optimized XOR and Majority gates have an area of 0.025 25 $\mu m^2$ and 0.047 25 $\mu m^2$ , respectively, which become 0.017 55 $\mu m^2$ and 0.0279 $\mu m^2$ , respectively, after the optimization. This clearly proves the algorithm efficiency as it diminishes the area by 30% and 41%, respectively. As the standard functionally equivalent implementations require 8 2-input XOR and 8 3-input Majority gates, they occupy 0.0784 $\mu m^2$ and 0.116 $\mu m^2$ real estate, respectively, our proposal enables a Figure 4.11: Optimized 8-bit Majority Gate Outputs a) $f_1$ =10 GHz, b) $f_2$ =20 GHz, ..., h) $f_8$ =80 GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . #### 4.47x and 4.16x area reduction, respectively. Generally speaking, to calculate a SW gate delay, one needs to sum-up the time associated to SW generation, propagation, and detection. The due to SW propagation through the waveguide delay depends on the travelled distance from generation to detection and it can be computed by dividing the distance by the SW group velocity, which is 3500 m/s for CoFeB [76]. Given that the longest propagation path for the 8-bit 2-input XOR and 3-input Majority gates is 351 nm and 558 nm, respectively, the propagation delay is 100 ps and 159 ps, respectively, which by adding the transducers delay sums up to 940 ps and 999 ps, respectively. For the scalar 2-input XOR and 3-input Majority gates the longest path is 196 nm and 290 nm, respectively, which translates into a transmission delay of 56 ps and 83 ps, respectively, and 896 ps and 923 ps overall gate delay, respectively. Thus, the 8-bit 2-input XOR and 3-input Majority gates are slower than their scalar counterparts with 5% and 7%, respectively. As both parallel and scalar gate implementations make use of the same number of transducers and the through the waveguide propagation consumes insignificant power, the two implementations are equivalent in terms of power consumption. To get some inside on the data parallelism practical upper-bound, we examined the consequences of increasing the number of bits per set, i.e., utilized frequencies. To this end we OOMMF simulate 8-bit and 9-bit 3-input Majority gate instances and display in Figure 4.12 the 10 GHz frequency output component for the input combinations Figure 4.12: MAJ Gate Outputs at $f_1$ =10GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . Figure 4.13: XOR Gate Outputs at $f_2$ =20GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . $\mathscr{I}_1\mathscr{I}_2\mathscr{I}_3=000$ and $\mathscr{I}_1\mathscr{I}_2\mathscr{I}_3=100$ . One can observe in the Figure that at time=0.5 ns the 8-bit Majority gate output has the same phase for the considered input combination, which reflects the correct functionality of the Majority gate as in both cases 0 is the majority. However, the 9-bit Majority gate output at time=0.5 ns has different phase, 0 for $\mathscr{I}_1\mathscr{I}_2\mathscr{I}_3=000$ , and approximately $\pi/4$ for $\mathscr{I}_1\mathscr{I}_2\mathscr{I}_3=100$ , which indicate that the gate starts to malfunction. Based on this, we can conclude that for the proposed topology and utilized material, 8 is the maximum number of frequencies one can use to construct robust parallel SW gates. However, one can go beyond this limit if threshold detection based is utilized. To examine the effect of embedding more than 8 frequencies, we evaluate by means of OOMMF simulations 2-input XOR gates with 8, 9, 10, and 16 frequencies. For illustration purpose, we display in Figure 4.13 the 20 GHz frequency output component for the input combinations $\mathcal{I}_1 \mathcal{I}_2 = 00$ and $\mathcal{I}_1 \mathcal{I}_2 = 01$ , which should give a 0 and 1 output value, respectively, for all the considered input widths. The Figure clearly indicates that while the spin wave magnetization difference between the two input combinations decreases as the number of frequency increases, which makes output detection more challenging, two different levels can still be distinguished and a threshold defined, as such if the spin wave magnetization is greater than that threshold, the output is 0, and 1, otherwise. To clarify this, let us inspect the output value at time moment 0.4 ns for the 8, 9, 10, and 16bit XOR gates. For the input combination $\mathcal{I}_1 \mathcal{I}_2 = 00$ the output SW has a higher amplitude than the one corresponding to $\mathcal{I}_1\mathcal{I}_2 = 01$ , which means that a threshold can be set and based on threshold detection, X(N)OR can be detected. This suggests that threshold detection based gates are more robust and can operate with up to 16-bit inputs. Note that more than 16-bit inputs might be realizable but it is part of planned future work. Figure 4.14 presents OOMMF simulation results for the 16-bit based 2-input XOR gate. As it can be observed from the FFT magnitude spectrum in Figure 4.14, the information is encoded in SWs with 16 different frequencies, 10, 20, ..., 160 GHz and the 4.4. CONCLUSIONS 75 Figure 4.14: Optimized 16-bit Majority Gate Response in Time and Frequency. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . output for all the possible input combinations ( $\mathcal{I}_1 = \mathcal{I}_2 = 0$ ),..., ( $\mathcal{I}_1 = \mathcal{I}_2 = 1$ ) can be detected at each frequency. To further examine the results, we filter each frequency component for different input combinations separately in Figure 4.15, and one can observe that the output SWs are not distorted and can be properly detected at each frequency, which means that the 16-bit XOR/XNOR gate operates correctly. Let us consider the 20 GHz output time moment 0.75 ns, and a detection threshold value of 0.04 $M_s$ . For $\mathcal{I}_1 = \mathcal{I}_2 = 0$ , or $\mathcal{I}_1 = \mathcal{I}_2 = 1$ the absolute SW magnetization value is greater than 0.04 $M_s$ due to the constructive interference, which means 0 logic output as it should. For $\mathcal{I}_1 = 0\mathcal{I}_2 = 1$ , or $\mathcal{I}_1 = 1\mathcal{I}_2 = 0$ the absolute SW magnetization value is lower than 0.04 $M_s$ , which means a 1 logic output as it should. An XNOR can be realized by flipping the condition such that a SW magnetization lower (greater) than 0.04 $M_s$ is read as a logic 0 (1). The same line of reasoning can be utilized to determine all threshold values as, 0.045 $M_s$ , 0.04 $M_s$ , 0.038 $M_s$ , 0.033 $M_s$ , 0.032 $M_s$ , 0.03 $M_s$ , 0.028 $M_s$ , 0.025 $M_s$ , 0.005 $M_s$ , 0.004 $M_s$ , 0.0035 $M_s$ , 0.0068 $M_s$ , 0.005 $M_s$ , 0.0045 $M_s$ , 0.004 $M_s$ , 0.0035 $M_s$ , and 0.002 $M_s$ , for value increasingly ordered frequencies. # 4.4. CONCLUSIONS We presented a novel *n*-bit data parallel spin wave logic gate. In order to explain the proposed concept, we implemented and validated by means of OOMMF, 8-bit 2-input XOR and 3-input Majority gates. Further, we proposed an optimization algorithm to minimize the area overhead of the proposed multi-frequency gates and demonstrate that the algorithm diminishes the area by 30% and 41% for XOR and MAJ gates implementations, respectively. Moreover, to asses the potential of our proposal, we evaluated and compared the proposed multifrequency gates with functionally equivalent scalar SW gate based implementations in terms of area, delay, and power consumption. The results indicated that the byte-based XOR and Majority gates require 4.47x and 4.16x area less than the conventional (scalar) implementations, respectively, at the expense of 5% to 7% delay overhead and without inducing any power consumption overhead. Finally, we Figure 4.15: Optimized XOR Gate Outputs: a) $f_1$ =10 GHz, b) $f_2$ =20 GHz, ..., p) $f_{16}$ =160 GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase $\pi$ . demonstrated that, for current gate topology and materials, the maximum number of frequencies (gate parallelism) is 8 and 16 for phase and threshold based output detection, respectively. # SPIN WAVE WAVEPIPELINE - 1.1. SPIN WAVE MAJORITY GATE OPERATION MODE - 1.2. WAVE PIPELINING ACHIEVEMENT IN SW TECHNOLOGY - 1.3. CONCLUSIONS As reported previously, different SW based logic gates and circuits have been recently reported [58], [59], [62], [77], [80]–[96], [109]. All works have not discussed the working modes of spin wave, and assuming that the continuous excitation of spin wave from the excitation until the resultant spin wave detection is appropriate mode for the excitation. Also, they did not examine the possibility of implementing the wave pipelining in the spin wave technology with an exception for the work in [109] which discussed theoretically without validation the possibility of SW wave pipelining implementation. This chapter proposes, validates, and evaluates a SW 3-input Majority gate under both continuous and pulse mode operation. Furthermore, we evaluate the magnetization dynamics of the rectangular pulse excitation, sinusoidal pulse excitation, and Gaussian pulse excitation. Finally, we utilize pulse mode operation to introduce, validate, and evaluate wave pipelining into SW circuits. This chapter content is based on the following publication: **A. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui and S. Cotofana, *Achieving Wave Pipelining in Spin Wave Technology*, 2021 22nd International Symposium on Quality Electronic Design (ISQED), 2021, pp. 54-59. Figure 5.1: 3-input SW Majority Gate. # **5.1.** Spin Wave Majority Gate Operation Mode In the following lines, the 3-input Majority gate working under Continuous Mode Operation (CMO) and Pulse Mode Operation (PMO) is explained. In addition, the simulation parameters, simulation results, and performance evaluation are discussed. # 5.1.1. CMO AND PMO SPIN WAVE MAJ CONCEPT Figure 5.1 presents the 3-input SW Majority gate we utilize as discussion vehicle to demonstrate and evaluate the impact of the two operation modes. In order to achieve the desired functionality as mentioned in the previous chapters, the Majority gate parameters and dimensions must be carefully determined. SW wavelength $\lambda$ must be larger than the waveguide width in order to simplify the interference pattern. In addition, to correctly capture the output values, the input SWs must be excited at the same time and with the same amplitude, frequency, and wavelength. Further, the structure dimensions $d_1$ , $d_2$ , $d_3$ are essential for the gate behavior. For instance, if the SWs should constructively interfere when they are in-phase, and destructively when they are out-of-phase, $d_1$ , $d_2$ , $d_3$ must be $n\lambda$ (where n = 0, 1, 2, 3, ...). In contrast, if the desired case is to interfere destructively when they are in-phase, and constructively when they are out-of-phase, $d_1$ , $d_2$ , $d_3$ must be $(n+1/2)\lambda$ (where n=0,1,2,3,...). In order to detect the output correctly, $d_4$ must be, also, accurately chosen as its value determines if the gate computes the Majority function or its complement. For example, if the gate output itself is desired, $d_4$ must be $n\lambda$ , and if inverted Majority is the desired output, $d_4$ must be $(n+1/2)\lambda$ . As seen from the point of view of SW excitation, SW gates and circuits can operate in two main modes: Continuous Mode Operation (CMO) and Pulse Mode Operation (PMO). CMO: SWs are excited by using continuous signals such that the excitation signal remains active until the output result detection completion because: i) SW with fixed frequency and wavelength are excited in the same time with the same amplitude, and ii) The SW resulted from input SWs interferences can be easily detected as it has the same frequency and wavelength as the inputs. However, as the excitation signal is active from the beginning to the end of the calculation, CMO SW devices energy consumption is large, especially because they are slow. - PMO: In contrast to CMO, this operation is more energy effective as a result of the fact that the excitation signal is active only for a short period of time. However, PMO is more complex to work with because: - SW excitation by means of a pulse signal produces multiple SWs with different wavelengths and frequencies as the pulse covers a large band in the frequency domain. Therefore, all the covered frequencies are excited in the waveguide, which results in the creation of multiple wavenumbers ( $k = 2 \times$ $\pi/\lambda$ ) and wavelengths SWs. However, the number of frequencies can be limited by using sinusoidal or Gaussian excitation signals as it can be observed in Figure 5.2, which presents the magnetization dynamics (in time and frequency (Fast Fourier Transform (FFT) of the propagated pulse) domains) of the pulse propagation excited in-line waveguide using three different pulse signals: i) Rectangular pulse of 70 ps duration, ii) Rectangular pulse of 300 ps and 15 GHz frequency carrier, iii) Gaussian pulse $\sigma = 300$ ps and 15 GHz carrier frequency. As can be seen in the figure, strong signal deformation in rectangular pulsed generation due to the spin-wave dispersion. In addition, shorter pulse generates broad frequency ranges that make very difficult to build logic functions based on interference using this kind of pulses. Although rectangular pulse of 300 ps and 15 GHz frequency carrier improves the magnetization dynamics, but multiple frequencies SWs are still excited as can be observed in Figure 5.2. Moreover, Gaussian pulse $\sigma = 300 \,\mathrm{ps}$ and 15 GHz carrier frequency filtered out all other undesired frequencies and just excited the desired 15 GHz frequency SW as can be noticed from Figure 5.2. Note that this Figure is generated with the parameters specified afterwards, but with damping factor of 0.0025. - If the inputs are not located at the same distance from the output, the input SWs must be excited at different time moments and with different amplitudes. Otherwise, the gate malfunctions as the closest input SW reaches the output before the furthest SW, which, also, has a lower energy when reaching the output as a result of the damping effect in the waveguide. This requires a complex clocking scheme, which can be simplified by attempting to equalize all the input to output propagation paths by means of proper layout and/or delay buffers. The generation of the MAJ gate output O value (see Figure 5.1) is performed as follows: SWs are excited at $I_1, I_2$ , and $I_3$ to propagate through the waveguides using continuous signal which is active until detecting the result (CMO) or pulse signals (PMO). The SW resulted from the interferences between input SWs are detected based on phase such that if the output SW has a phase difference of 0 versus a predefined phase, the output is logic 0, whereas the output is logic 1 if the phase difference is $\pi$ . Figure 5.2: Magnetization dynamics of 70 ps rectangular pulse excitation SW propagating in a) time domain, and b) frequency domain, 300 ps 15 GHz sinosoidal pulse excitation SW propagating in c) time domain, and d) frequency domain, and c) 300 ps 15 GHz Gaussian pulse excitation SW propagating in i) time domain, and ii) frequency domain. #### **5.1.2.** SIMULATION SETUP AND RESULTS We use MuMax3 [195] to validate the correct functionality of the proposed concepts and structure. The MAJ3 gate is built with 50 nm wide $Fe_{60}Co_{20}B20$ waveguide along with the following parameters [346]: Magnetic saturation $M_s$ =1.1 MA/m, damping constant $\alpha$ =0.004, Exchange stiffness $A_{exch}$ =18.5 pJ/m, and Thickness t=1 nm. For the CMO, we choose a SW wavelength of 55 nm as for proper gate operation it has to be larger than the waveguide width w. Accordingly, Figure 5.1 structure dimensions were determined as: $d_1$ =330 nm (n = 6), $d_2$ = 880 nm (n = 16), $d_3$ =220 nm (n = 4), and $d_4$ = 55 nm (n = 1). Furthermore, using the parameters in Table 5.1 and w, the we calculate the SW dispersion relation [74] and determine SW frequency f = 10 GHz and wavenumber k = $2\pi/\lambda$ =50 rad/ $\mu$ m. As the target is to compare the two operation modes, we maintain the same dimensions also for PMO, and in order to minimize the number of excited frequencies, we made use of sinusoidal pulse. Note that we avoided the use of the Gaussian pulse although it excites one SW frequency because of the complex generation of the Gaussian pulse experimentally. Figure 5.3 presents the simulation results of the 3-input Majority gate (Figure 5.1) working under CMO for $\{I_1, I_2, I_3\} = \{0, 0, 0\}, \{0, 0, 1\}, \{0, 1, 0\}, \{0, 1, 1\}, \{1, 0, 0\}, \{1, 0, 1\}, \{1, 1, 0\},$ and $\{1, 1, 1\}$ , respectively, from a) to h). Note that in the Figure logic 0 and 1 are represented by blue and red color, respectively. As mentioned previously, the input SWs activation signal is **ON** all the time and keeps exciting SW until the output *O* detection is completed. As it can be observed from Figure 5.3, the output *O* is correctly detected. For example, *O* is logic 0 for the input patterns $\{I_1, I_2, I_3\} = \{0, 0, 0\}, \{0, 0, 1\}, \{0, 1, 0\},$ and Table 5.1: Parameters | Parameters | Values | |---------------------------------------------|------------------------------------| | Magnetic saturation $M_s$ | $1.1 \times 10^6 \text{A/m}$ | | Perpendicular anisotropy constant $k_{ani}$ | $8.3177 \times 10^5 \text{ J/m}^3$ | | Damping constant $\alpha$ | 0.0002 | | Waveguide thickness t | 1 nm | | Exchange stiffness $A_{exch}$ | 18.5 pJ/m | $\{1,0,0\}$ , whereas O=1 for the input combinations $\{I_1,I_2,I_3\}=\{0,1,1\},\{1,0,1\},\{1,1,0\}$ , and $\{1,1,1\}$ , which indicate that the Majority gate behaves correctly under the CMO scenario. Figure 5.4 presents the simulation results of the 3-input Majority gate (Figure 5.1) working under PMO for $\{I_1,I_2,I_3\}=\{0,0,0\},\{0,0,1\},\{0,1,0\},\{0,1,1\},\{1,0,0\},\{1,0,1\},\{1,1,0\}\}$ , and $\{1,1,1\}$ , respectively, from a) to h). In this case, we make use of a 100 ps sinusoidal excitation signal in order to decrease the number of excited frequencies. As it can be observed from Figure 5.4, O is still correctly detected. For example, the O=0 for the input patterns $\{I_1,I_2,I_3\}=\{0,0,0\},\{0,0,1\},\{0,1,0\}$ , and $\{1,0,0\}$ , whereas O=1 for the input combinations $\{I_1,I_2,I_3\}=\{0,1,1\},\{1,0,1\},\{1,1,0\}$ , and $\{1,1,1\}$ , which indicate that the Majority gate behaves correctly under the PMO scenario. As it can be observed in the Figure, PMO generates multiple SW wavelengths as the SWs do not travel the same distance and because of that and in contrast to CMO, results are not accurate in all positions. However, as the output position is accurately determined, results are correctly captured at the output position as depicted in the Figure. To conclude, the simulation results demonstrate that the Majority gate works correctly under both CMO and PMO scenarios. #### **5.1.3.** Performance Evaluation To get inside on the practical implications of operation mode on the MAJ gate, we estimate its energy consumptions under both CMO and PMO. To this end the following assumptions are in place [348]: the single SW transducer exhibits a 0.42 ns delay and a power consumption of 34.3 nW. Note that because we analyze one gate only, we do not include clock complexity and overhead in our evaluation. However, clock will certainly play an important role when analyzing large complex circuits with unequal paths. Moreover, as SW technology is immature, the made assumptions might need reevaluation in the future as it evolves towards maturity. Note that the SW propagation delay is extracted directly from MuMax3 simulations, and it is 1 ns. Thus, by adding the input and output transducers delays to the SW propagation delay, the SW MAJ gate total delay sums up to 1.84 ns. Under these assumptions, PMO 3-input SW MAJ induces a 13.7 aJ energy consumption as the source is active for 100 ps, and there are four transducers. CMO 3-input SW MAJ results in an energy consumption of 252.5 aJ as the source is active for 1.84 ns and there are four transducers. Hence, PMO diminishes the energy consumption by a factor of 18x. This can be easily explained by the fact that when SWs excitation is performed by the continuous application of voltages/currents, the overall energy consumption is determined by the transducer power and the gate critical path length (delay). However, if transducers are oper- Figure 5.3: CMO 3-input SW MAJ MuMax3 Simulation. Figure 5.4: PMO 3-input SW MAJ MuMax3 Simulation. ated in pulses the energy becomes gate delay independent as it is mainly determined by the transducer power and delay, thus pulse operation should be targeted. Note that regardless of the operation mode SWs are excited and detected at different time moments, which makes clocking unavoidable. However, its complexity analysis requires further SW technology developments and constitutes future work. While PMO has a substantial impact on energy consumption reduction, it is also an enabling factor for the realization of SW circuits operating under the Wave Pipelining paradigm [114], [115], [350], [351], which increases circuit throughput without requiring inter-stage registers. Based on this observation, the next section introduces, validates, and evaluates SW wave-pipelined circuit. # 5.2. WAVE PIPELINING ACHIEVEMENT IN SW TECHNOLOGY This section introduces the Wave Pipelining (WP) concept in the context of SW technology and discusses it on a simple circuit composed by 3 cascaded MAJ gates. In addition, micro-magnetics simulation results and performance evaluation are also presented. Figure 5.5: 4 Cascaded 3-input SW MAJ Gates. #### **5.2.1.** WAVE PIPELINING Pipelining is processing multiple sets of inputs before the first set reaches the output [350], [351]. This is performed by slicing the function into multiple stages where each stage is isolated from the previous and the next stage by means of registers that store intermediate processed data [350], [351]. Each stage processes a data set independently from the other stages. After data sets processing is complete, the results are stored in a register to be passed to the next stage on the following clock cycle [350], [351]. In order to minimize (preferably totally remove) the register number in a pipelined system, Wave Pipelining (WP) was introduced [114], [115]. The main idea is to allow for the coexistence and interference free handling of multiple data sets withing a register free processing pipeline circuit [114], [115]. To be able to operate in such a manner the circuit has to be redesigned such that all its propagation paths exhibit the same delay. This guarantees that input sets do not interfere within the circuit and reach the output in their chronological order. Figure 5.5 presents the SW circuit we make use of as WP discussion vehicle. It comprises 4 MAJ gates and 3 directional couplers [352], has 9 inputs $I_1$ , $I_2$ , $I_3$ , $I_4$ , $I_5$ , $I_6$ , $I_7$ , $I_8$ , $I_9$ , and one output O, which evaluates MAJ(MAJ( $I_1$ , $I_2$ , $I_3$ ), MAJ( $I_4$ , $I_5$ , $I_6$ ), MAJ( $I_7$ , $I_8$ , $I_9$ )). Note that to allow for input SWs excitation at the same moment in time and with the same amplitude, all inputs have to be placed at the same distance from the gate output. More- over, as the MAJ gate SW output has an input data dependent on amplitude and cannot be directly cascaded, we make use of 3 directional couplers to normalize the output of the layer one MAJ gates [352]. The aforementioned design steps hold true for this case as well. In addition, the directional couplers must be designed, i.e., determine the coupler length Lw and the gap between the directional coupler and the main waveguide DW such that they normalize the output of the layer one MAJ gates, and the layer two can properly operate. This can be performed by making use of the equations in Chapter 2. To detect the output O (see Figure 5.5) correctly the operation principle of the proposed circuit is as follows: SWs are excited at each source $I_1$ , $I_2$ , $I_3$ , $I_4$ , $I_5$ , $I_6$ , $I_7$ , $I_8$ , and $I_9$ . Then the SWs interfere in groups of three per MAJ gate and the resulted SWs are normalized using the directional couplers. After that, the three SWs produced by the directional couplers interfere constructively or destructively depending on their phases, and finally, the resulted SW is detected at the output by means of phase detection. As we want to utilize the structure in Figure 5.5 to demonstrate the SW WP concept, the PMO must be utilized as it is the WP enabling factor because a new input set can be applied before the evaluation of the previous one is completed. Therefore, the sources $I_1$ , $I_2$ , $I_3$ , $I_4$ , $I_5$ , $I_6$ , $I_7$ , $I_8$ , and $I_9$ are utilized to excite multiple SWs' sets by means of pulse signals distanced by a certain time gap determined such that each set does not affect the previous excited set. In this way, multiple data sets SWs coexist in the circuit, properly interfere in the level 1 MAJ gates, are normalized, interfere again in the level 2 MAJ gate, and the corresponding output value is detected at O. As such, the circuit throughput is increased by a time gap determined factor without inter-stage register insertion. #### **5.2.2.** SIMULATION SETUP AND RESULTS To validate the correct functionality of the circuit in Figure 5.5 under WP operation, we make use of MuMax3 simulations while keeping the waveguide material and width, wavelength and parameters reported previously. Consequently, Figure 5.5 structure dimensions are: $d_1$ = 330 nm (n = 6), $d_2$ =880 nm (n = 16), $d_3$ =220 nm (n = 4), $d_4$ =2750 nm (n = 50), $d_5$ =935 nm (n = 17), $d_6$ =3300 nm (n = 60), $d_7$ =2145 nm (n = 39), and $d_8$ =55 nm (n = 1). The directional coupler dimensions are determined based on the equations in Chapter 2 as $L_w$ =1500 nm and DW =10 nm. Figure 5.6: First 4 SWs Sets Wave Pipelined Normalized Magnetization. Figure 5.7: Second 4 SWs Sets Wave Pipelined Normalized Magnetization. $\{1,1,1\}$ , where $II_1$ , $II_2$ , and $II_3$ are the normalized outputs of the first, second, and third layer 1 MAJ gate, respectively. For the same material related reasons, we split the 8 combinations into two groups and apply them separately to the circuit. In this way each new set in the group is still slightly affected by the previous set, but the circuit still functions correctly if up to four sets are injected by using 0.5 ns pulse signal and a time gap of 1.5 ns. Based on this timing scheme SWs sets are excited starting from 0 ns to 0.5 ns, from 1.5 ns to 2.0 ns, from 3.0 ns to 3.5 ns, and from 4.5 ns to 5.0 ns and the corresponding Mu-Max3 simulation results are presented in Figure 5.6. In order to validate the proper WP behavior, we make use of a reference SW, which is the result of exciting the input SWs with {0,0,0,0,0,0,0,0,0,0} using a sinusoidal pulse signal with a duration of 0.5 ns. As it can be observed from the Figure, the output SW has a phase difference of 0 at time slot starts at 13.9 ns, and slot width of 0.5 ns. As the SW sets are pipelined with 1.5 ns time difference, the result for the next set should be ready after 1.5 ns. Therefore, the next set result is detected at time slot starts at 15.4 ns, and slot width of 0.5 ns at which the SW has a phase difference of approximately 0. Likewise, the third set result is detected after 1.5 ns, the resulted SW has a phase difference of approximately 0 at time slot starts at 16.9 ns, and slot width of 0.5 ns. Finally, the fourth set result is detected at time slot starts at 18.4 ns, and slot width of 0.5 ns, which has a phase difference of approximately $\pi$ . Therefore, all outputs are correctly computed and detected. Similarly, the results in Figure 5.7 $\{1,1,1,1,1,1,0,0,0\}, \{1,1,1,0,0,0,1,1,1\},$ and $\{1,1,1,0,0,0,0,0,0\},$ the first three SWs sets will have a phase difference of $\pi$ with the reference signal, whereas the last one a phase difference of 0 as can be noticed in Figure 5.7 at time slots start at 13.9 ns, 15.4 ns, 16.9 ns, and 18.4 ns, respectively, and slot width of 0.5 ns. Note that the same reference signal in Figure 5.6 is utilized. To conclude, the simulation results demonstrate that the WP concept is validated within the SW technology framework and that up to 4 SWs sets can be wave pipelined in the waveguide because SWs persist in the waveguide for longer than 5 ns, and therefore, each new set might be affected by previous sets, but since the new excited SW is stronger, their effect is limited. Note that there might be a place for optimization for the number of the WP sets by: i) Exciting SWs sets with different energy level such that the first SWs set has the lowest energy level while the last one has the highest energy level. ii) Decreasing the pulse duration. iii) Increasing the time difference between the excited SWs sets. iv) Using a material with a slightly higher damping effect as waveguide such that SWs can still propagate on long enough distances but in the same time vanish quicker. #### **5.2.3.** Performance Evaluation To get inside on the SW WP potential, we examine the throughput of the circuit in Figure 5.5 with and without WP. In order to do so, we assume that an idle time of 5 ns is required to avoid the effect of previous input SWs on newly applied SWs. This time overhead is required after each input in normal circuit operation and after each input set group when WP is in place. From Figures 5.6 and 5.7, we can notice that the 8 cases results are ready in 36.8 ns as the group wave pipelined result is ready after 18.4 ns. However, due to the idle time, the second group has to be 5 ns delayed, which implies that 8 operations can be completed in 41.8 ns. In contrast, the 8 not wave pipelined evaluations can be performed in 151.2 ns by taking into consideration that one new result is available each and every 13.9 ns and 5 ns idle time is required between successive input sets. Thus, WP utilization increases the throughput by 3.6x for the implementation on the structure in Figure 5.5. # **5.3.** CONCLUSIONS We proposed and validated by means of micromagnetic simulations a SW 3-input Majority gate under continuous and pulse mode operation regimes. We, also, evaluated the gate energy consumption, and our results indicated that Pulse Mode Operation (PMO) diminishes the gate energy consumption by a factor of 18 when compared with the continuous mode operation. In addition, we presented how PMO enables Wave Pipelining (WP) within SW circuits and validated WP on a 4 cascaded 3-input Majority gates circuit by means of micromagnetic simulations. Furthermore, we demonstrated that WP utilization improves the circuit throughput by 3.6x. # SPIN WAVE NORMALIZATION TOWARD ALL MAGNONIC CIRCUITS - 1.1. Spin Wave Gate Cascading Challenge - 1.2. SPIN WAVE FULL ADDER - 1.3. Spin Wave 4:2 Compressor - 1.4. SW GATE CASCADING WITHOUT DOMAIN CONVERSION - 1.5. CONCLUSIONS A feature that is common to all interference-based wave computing devices is the fact that the gates cannot be cascaded directly when information is encoded in the phase of unit amplitude input waves. In this case, the phase of the output wave corresponds to the majority of the phases of the input waves but its amplitude depends on whether weak or strong majority is calculated. Consequently, the output wave does not necessarily have unit amplitude, and therefore, cannot be directly fed as input to a follow-up majority gate. This means that an amplitude renormalization is required when spin waves are transmitted between gates via local waveguide interconnects. Note that this problem occurs This chapter content is based on the following publications: **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana and S. Hamdioui, *Spin Wave Based 4-2 Compressor*, 2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), 2021, pp. 1-4, doi: 10.1109/ICECS53924.2021.9665499. **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana and S. Hamdioui, *Spin Wave Based Full Adder*, 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5. **A. N. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana and S. Hamdioui, *Spin Wave Normalization Toward All Magnonic Circuits*, in IEEE Transactions on Circuits and Systems I: Regular Papers, **68**, no. 1, pp. 536-549, Jan. 2021. regardless of the way the output spin wave is interpreted, i.e., by phase or by thresholding. Solutions that circumvent the issue have been proposed for rather simple circuits, such as a full adder and compressor, which are presented in the next two sections consequently. However, for the design of general more complex circuits, majority gates have to be augmented by a second type of device able to renormalize its output SW amplitude while preserving the phase. It should be noted, that spin wave inverters (phase shifters) do not affect the amplitude, and therefore, can be connected via SW waveguides without the need for renormalization. The amplitude renormalization is a non-linear process and therefore, there is no simple modification of a linear majority gate to achieve this effect, e.g., by the interference with reference waves or duplicated inputs. As discussed above, SW gates can be connected by means of conventional interconnects in the electric (or optical) domain, which requires information conversion from spin to charge representation and back. As such, the information can then be stored in a charge-based memory device or utilized to generate SWs to feed adjacent gate inputs. However, as also remarked above, conversion is a power-hungry process and low energy consumption can only be achieved if a significant amount of data is processed without leaving the SW domain. As such, an energy effective SW computation platform should include large SW islands, formed by directly connectable SW gates, linked by electrical (or optical) global interconnects. Therefore, spin wave amplitude renormalization methods that allow for direct SW gate cascading are essential in order to open the road towards magnetic domain only circuit designs. In the sequel the SW gate cascading challenges, spin wave full adder, and 4:2 compressor, in addition to the proposed gate cascading solution are thoroughly discussed. # **6.1.** SPIN WAVE GATE CASCADING CHALLENGE To evaluate complex Boolean functions, one needs to be able to interconnect spin wave gates to form the required circuit. However, directly cascading Majority or any other type of SW gates may produce wrong results. To clarify this issue let us assume the situation in Figure 6.1a where a 3-input Majority (MAJ3) gate output is connected to one of the inputs of another MAJ3 gate. All input SWs are excited with the same amplitude A, frequency f, and a 0 phase corresponds to logic 0, and a $\pi$ phase to logic 1. Given that MAJ3 operation is governed by SW interference, both amplitude and phase of the SW gate inputs contribute to the output SW parameters. While from the point of view of an individual gate, the output value is solely determined by the output SW phase, this is not any longer the case when that output is utilized as input for a follow-up gates. Figure 6.1b and c present the SW interferences within the circuit when $I_1I_2I_3I_4I_5 = 00011$ and $I_1I_2I_3I_4I_5 = 00111$ , respectively. As one can observe in Figure 6.1b, the excited spin waves at $I_1$ , $I_2$ , and $I_3$ interfere constructively and produce on WG D a spin wave with the same phase as $I_1$ $I_2$ , and $I_3$ , but with a 3A amplitude (strong majority). Subsequently, WG D SW interacts with $I_4$ and $I_5$ SWs in the second MAJ3 gate, which produces an output SW with amplitude A and phase 0, which is wrong given that MAJ3(0,1,1) = 1. This wrong result is induced by the fact that the MAJ3 gate can properly operates on equal amplitude SWs, which is not the case for $I_1I_2I_3I_4I_5 = 00011$ . Figure 6.1c present the sit- Figure 6.1: a) Cascaded MAJ3 Gates, Spin Wave Waveform Analysis at b) $I_1I_2I_3I_4I_5$ =00011, c) $I_1I_2I_3I_4I_5$ =00111. uation for $I_1I_2I_3I_4I_5 = 00111$ case in which the first MAJ3 produces an A amplitude and phase 0 SW (weak majority), and the second gate produces the correct result as expected. Thus, cascading MAJ3 may induce wrong output results when the driving gate produces a strong majority 0 or 1 output. To clarify things even more, we build the structure depicted in Figure 6.2 that corresponds to two cascaded MAJ3 gates and evaluated its behaviour by means of OOMMF simulations. Figure 6.3 presents the OOMMF results when the parameters mentioned afterwards are utilized. Three different cases were tested $I_1I_2I_3I_4I_5 = 00000$ , $I_1I_2I_3I_4I_5 = 00111$ , and $I_1I_2I_3I_4I_5 = 00011$ . In the Figure, red represents logic 0, and blue logic 1. As it can be observed from the Figure, $I_1I_2I_3I_4I_5 = 00000$ results in an output O = 0, while $I_1I_2I_3I_4I_5 = 00111$ resulted in an output O = 1. However, in the case of $I_1I_2I_3I_4I_5 = 00011$ , the output is between logic 0 and logic 1 as a result of the strong 0 generated by the first MAJ3 gate (SW with 3 A amplitude). Figure 6.2: Cascaded In-Line MAJ3 Gates. Figure 6.3: Cascaded In-Line MAJ3 Gates Simulation Results. Thus, as the theoretical analysis, also, suggested wrong results are generated, which call for the MAJ3 gate augmentation with an amplitude normalizer able to enable SW gates cascading, and by implication, circuit design in the spin wave domain. This can performed by means of either two transducers to convert the spin wave to/from charge domain, or repeater to regenerate the spin wave with the same phase and predefined amplitude A, or a magic device able to let A amplitude SWs pass unchanged and throw away two thirds of the energy of 3A amplitude SWs. In the following sections, the first option is examined by implementing a spin wave full adder presented in Section 6.2, while the second option is examined by implementing a spin wave 4:2 compressor introduced in Section 6.3, whereas the third option is explained thoroughly in Section 6.4. # **6.2.** SPIN WAVE FULL ADDER We explain the spin wave based full adder in this section in addition to the simulation setup, results and performance evaluation. #### **6.2.1.** SPIN WAVE FULL ADDER STRUCTURE Figure 6.4 presents the novel developed energy efficient 1-bit Full Adder (FA) structure with inputs X, Y, and carry-in $C_i$ , and outputs Sum S and Carry-out $C_o$ . It is implemented by utilizing two XOR gates, and one Majority gate. The XOR gates are used to determine the Sum output and the Majority gate is used to determine the Carry-out output. The output of the first XOR gate being O = XOR(X,Y) is fed into the second XOR together with $C_i$ to produce the FA Sum $S = XOR(I,C_i)$ . Note that O is connected to I Figure 6.4: Spin Wave Based Full Adder. by a metal wire that allows the excitation of a spin waves at I with the same phase as the one detected at O. That Majority gate is used to generate carry-out $C_o = MAJ(X, Y, C_i)$ . As mentioned previously, the FA's excitation and detection cells can be voltage driven or current driven cells depending on the utilized excitation and detection methods. Different options for the spin wave excitation and detection can be used such as magnetoelectric cells [74], [353], microstrip antennas [74], [354], and spin orbit torques [74], [355]. The FA parameters must be carefully designed in order to achieve the desired functionality. The waveguide width must be equal or less than the SW wavelength $\lambda$ in order to have a proper interference pattern. In addition, all SWs must be excited with the same amplitude, wavelength, and frequency to guarantee the desired SWs interference results. Moreover, the waveguide's length must be chosen accurately to obtain the desired outputs. For example, if SWs with the same phase have to interfere constructively and SWs with opposite phase have to interfere destructively, then the distances $d_1$ and $d_2$ must be equal to $n\lambda$ (where n = 0, 1, 2, 3, ...). In the other case, when SWs with the same phase have to interfere destructively and SWs with opposite phase have to interfere constructively, then the distances $d_1$ and $d_2$ must be equal to $(n + 1/2)\lambda$ . Two main techniques are available to detect the spin wave output, namely phase detection and threshold detection. Phase detection detects the phase of the spin wave and compares it with a predefined value. If the phase difference between the detected and the predefined phase is 0, then the output is logic 0, whereas if the phase difference is $\pi$ , then the output is logic 1. On the other hand, threshold detection detects the spin wave amplitude and compares it with a predefined value. If the spin wave amplitude is larger than the predefined threshold, then the output is logic 0, whereas the output is logic 1 if the spin wave amplitude is less than or equal to the predefined threshold. When phase detection is used, the distances $d_4$ and $d_5$ must be chosen accurately because both Table 6.1: Parameters. | Parameters | Values | |-------------------------------|--------------------------------| | Magnetic saturation $M_s$ | $1.1 \times 10^{6} \text{A/m}$ | | damping constant $\alpha$ | 0.004 | | Exchange stiffness $A_{exch}$ | 18.5 pJ/m | | Thickness t | 1 nm | the non-inverted and the inverted versions can be detected depending on the distance between the output and the last interference point. For instance, if the desired result is to capture the non-inverted output, $d_4$ and $d_5$ must be $n\lambda$ , whereas $d_4$ and $d_5$ must be $(n+1/2)\lambda$ if the inverted output is desired. The distances $d_4$ and $d_5$ must be as close as possible to the last interference point in order to detect large spin wave amplitude as this is crucial during the phase and threshold detection. To detect the outputs S and $C_o$ (see Figure 6.4) correctly the proposed FA operates as follows: - Sum *S*: SWs excited at *X* and *Y* interfere with each other and the resultant SW is detected at *O* based on threshold detection. Next, the detected output at *O* feeds the input of the second XOR gate by exciting a SW with suitable phase. Finally, the SWs excited at *I* and *C<sub>i</sub>* interfere with each other and the resultant SW is detected at *S* based on the threshold detection. - Carry out $C_o$ : The excited SWs at X and Y interfere constructively or destructively with each other depending on their phases. Then the resultant SW propagates and interferes with the excited SW at $C_i$ . Finally, the phase of the resulting SW is detected at $C_o$ . #### **6.2.2.** SIMULATION SETUP AND RESULTS We made use of $w=50\,\mathrm{nm}$ wide $Fe_{60}Co_{20}B_{20}$ waveguide to validate the proposed FA by means of MuMax3 [195] with the parameters specified in Table 6.1 [346]. We set up the SW wavelength $\lambda$ to be 55 nm, which is larger than the waveguide width. Based on this, optimal design device dimensions are calculated resulting into: $d_1$ =330 nm (n=6), $d_2$ =880 nm (n=16), $d_3$ =220 nm (n=4), $d_4$ =55 nm (n=1), and $d_5$ =55 nm (n=1). To calculate the SW frequency, first the SW dispersion relation [356] is determined; this is done based on the parameters of Table 6.1 and the waveguide width. From the SW dispersion relation and by setting the wavenumber to be $k=2\pi/\lambda=50\,\mathrm{rad}/\mu\mathrm{m}$ , the frequency is derived to be $f=10\,\mathrm{GHz}$ . Table 6.2 presents the normalized magnetization values of the FA's Sum S output for different input combinations $\{X,Y,C_i\}=\{0,0,0\},\{0,0,0\},\{0,0,1\},\{0,1,0\},\{0,1,1\},\{1,0,0\},\{1,0,1\},\{1,1,0\},$ and $\{1,1,1\}$ , respectively. Note that threshold detection is used to generate the output S. As can be observed from the Table, the first intermediate cell O, which is the XOR of X and Y, can be implemented by choosing a suitable threshold such that if O is greater than the threshold O = 0, whereas O = 1 otherwise. The appropriate threshold in this case is 0.515, which is the average of 1 and 0.03. In this case, O = 0 for the inputs combinations $\{X,Y\}=\{0,0\}$ and $\{1,1\}$ , whereas O = 1 for the inputs combinations | $C_i$ | X | Y | О | I | S | |-------|---|---|------|---|------| | 0 | 0 | 0 | 1 | 0 | 0.98 | | 0 | 0 | 1 | 0.03 | 1 | 0.59 | | 0 | 1 | 0 | 0.03 | 1 | 0.58 | | 0 | 1 | 1 | 1 | 0 | 1 | | 1 | 0 | 0 | 1 | 0 | 0.59 | | 1 | 0 | 1 | 0.03 | 1 | 1 | | 1 | 1 | 0 | 0.03 | 1 | 0.99 | | 1 | 1 | 1 | 1 | 0 | 0.58 | Table 6.2: Normalized Full Adder Sum Output Magnetization. $\{X,Y\}=\{0,1\}$ and $\{1,0\}$ . As mentioned previously, the phase of the second intermediate cell I equals to the phase of the first intermediate cell O. To generate the output S, which is realized by the XOR of I and $C_i$ , a new threshold should be selected; this should be the average of 0.98 and 0.59, resulting in a threshold of 0.785. In this case, S=0 for the inputs combinations $\{I,C_i\}=\{0,0\}$ and $\{1,1\}$ , whereas S=1 for the inputs combinations $\{I,C_i\}=\{0,1\}$ and $\{1,0\}$ , which reflects the correct detection of the FA Sum output. Hence, the simulation validates the correct generation of the Sum output of the FA when appropriate thresholds are selected. Figure 6.5 a) to h) present the results of the proposed FA Carry-out $C_o$ output for different input combinations $\{X,Y,C_i\}=\{0,0,0\},\{0,0,0\},\{0,0,1\},\{0,1,0\},\{0,1,1\},\{1,0,0\},\{1,0,1\},\{1,1,0\},$ and $\{1,1,1\}$ , respectively. In the Figure, the blue color represents logic 0 whereas the red color represents logic 1 and indicates that the output $C_o$ of the adder is correctly captured. For instance, $C_o=0$ for the input combinations $\{X,Y,C_i\}=\{0,0,0\},\{0,0,1\},\{0,1,0\},$ and $\{1,0,0\},$ whereas $C_o=1$ for the input patterns $\{X,Y,C_i\}=\{0,1,1\},\{1,0,1\},\{1,1,0\},$ and $\{1,1,1\},$ which proves that the FA Carry-out output is correctly generated. Note that although Sum output is presented in the Figure its colour is not relevant as threshold based detection is in place for it (see Table 6.2). In conclusion, the simulation results demonstrate that by combining threshold detection and phase detection, a 1-bit FA can be designed. #### **6.2.3.** Performance Evaluation The proposed FA is assessed and compared with the state-of-the-art CMOS [357], Magnetic Tunnel Junction MTJ [358], [359], Spin Hall Effect SHE [360], Domain Wall Motion DWM [361], and Spin-CMOS [362] based FA in terms of energy, delay, and area (the number of utilized devices). In the evaluation and comparison, the following assumptions are made: (i) Excitation and detection cells are magnetoelectric (ME) cells. (ii) The ME's energy consumption and delay are 14.4 aJ and 0.42 ns, respectively [348]. (iv) SWs don't consume noticeable energy in the waveguide in comparison with the transducer energy consumption. (v) SWs are excited using pulse signals. Note that these assumptions might not reflect the reality of the spin wave technology because of its early stage development, and they might need to be re-evaluated in the future. The SW FA delay is determined by adding the delay of 4 ME cells because there are 4 Figure 6.5: Spin Wave Based FA MuMax3 Simulation. Table 6.3: FA Performance Comparison. | | Energy (fJ) | Delay (ns) | Device No. | |-----------------|-------------|------------|------------| | CMOS [363] | 0.176 | 0.1 | 22 | | MTJ [359] | 5685 | 3.02 | 29 | | SHE [360] | 4970 | 7 | 26 | | DWM [361] | 74.8 | 0.88 | 68 | | Spin-CMOS [362] | 166.8 | 3 | 34 | | Conv. SW | 0.129 | 2.86 | 7 | | Prop. SW | 0.1 | 2.86 | 7 | cells (2 excitation and 2 detection cells) in the critical path to the SW propagation delay in the waveguide, which is extracted from micromagnetic simulation and it is $1.18\,\mathrm{ns}$ . Therefore, the SW FA delay is $2.86\,\mathrm{ns}$ . The straightforward approach to build a SW FA is by utilizing 3 MAJ gates. However, as direct MAJ gate cascading is not possible in the spin wave amplitude normalization is required, which can be performed by converting SW gate outputs to charge domain and back by means of two transducers. As such we compare our implementation with conversion based (Conv.). Table 6.3 summarises the performance of the proposed SW FA and the considered contenders. As it can be observed from the Table, the SW FA saves 43% energy whereas it requires 28.6x more delay when compared with the 22 nm CMOS based FA design. Moreover, it consumes 4 orders of magnitude less energy, and exhibits 5% and 59% less delay than the MTJ and SHE based FAs, respectively. When compared with the DWM based FA it consumes 2 orders of magnitude less energy at the expense of 3× higher delay. Furthermore, the proposed FA consumes 3 orders of magnitude less energy and exhibits 5% less Figure 6.6: a) 8 Dots in an 8-bit Multiplier Partial Product Stage Processed by Full and Half Adders. b) 8 Dots in an 8-bit Multiplier Partial Product Stage Processed by 4:2 Compressor. delay in comparison with the Spin-CMOS based FA. Last but not least, the proposed SW FA consumes 22.5% less energy than MAJ based SW implementation, while having the same delay than the Conv. counterpart. Note that the MTJ device number [359] consists of 25 transistor and 4 MTJ, whereas the SHE device number [359] consists of 23 transistor and 3 SHE-MTJ. Also, the DWM device [359] consists of 20 transistor, 4 MTJ, and 2 Domain Wall DW, whereas the SPIN-CMOS device [359] consists of 28 transistor, 4 MTJ and 2 DW. Note that the proposed FA needs the least number of devices, which indicates that it potentially requires a small chip real-estate. Note that we didn't consider the FA in [59] in the comparison as up to date it has not been validated. Our attempts to do that by means MuMax3 failed as it relies on unattainable assumptions, e.g., output detection at the interference point, output initialization to 0 before computing, zero ME cell delay and 4.8 aJ power consumption. Figure 6.7: 4:2 Compressor. ## **6.3.** Spin Wave 4:2 Compressor For many state-of-the-art applications, e.g., artificial neural network, machine vision, which heavily rely on multiplications, the availability of fast multipliers is essential. Tree multipliers are the fastest and can perform a multiplication within 2 processor clock cycles [364]. They embed 3 stages, i.e., partial product generation, reduction tree, and carry propagation adder. In an n-bit multiplier the first stage requires $n^2$ gates to produce the partial products matrix, the second stage provides a logarithmic depth reduction of n n-bit partial products to two numbers (n:2 reduction) without carry propagation, and the final stage is a carry propagate adder that sums-up the reduction tree outputs [364]. n:2 reduction is a carry propagation free addition and has been traditionally implemented by means of Full and Half Adders resulting in a reduction tree depth of $O(\log n)$ . More re- cently the theoretical concept of n:2 compressors has been introduced and its practical implementation, e.g., 4:2 compressor, reported [365]–[368]. When built with 4:2 compressors, each element in the reduction tree processes 4 instead of 3 bits, which results in shallower reduction trees with a more regular layout [364]. To get some inside on this, let us assume 8-bit multiplication as discussion vehicle. In this case, 8 8-bit partial products have to be reduced to 2 numbers and Figure 6.6 presents the processing of an 8-bit column in dot notation with Full and Half adders in Figure 6.6 a) and with 4:2 compressors in Figure 6.6 b). As one can observe in the Figure, the Full and Half adders-based reduction requires 2 stages while the 4:2 compressors reduction one stage only. Essentially speaking, a 4:2 compressor can be implemented by 2 cascaded Full Adders processing 4 bits in the same column and generating one sum bit in the current column and a carry to the next column, as depicted in Figure 6.7. As the 4:2 compressor output can only assume a value between 0 and 3 while the input value can be between 0 and 4 a transport for the 4:2 compressor in the adjacent column is required in order to conserve the input value. Thus, the 4:2 compressor applied in column i of the partial product matrix processes 4 dots in that column $(I_1, I_2, I_3, \text{ and } I_4)$ , and a Carry-in $C_{in}$ provided by a 4:2 compressor in column i-1 $(C_{o1}(i-1))$ , and generates 3 outputs, 1 intermediate transport $C_{o1}(i)$ that serves as $C_{in}$ for a counter in column i+1, the Sum S(i) and Carry-out $C_{o2}(i)$ . Note that as $C_{o1}(i-1)$ participates in the second stage of the calculation there is no extra delay in induced and the reduction is still performed in a carry propagation free manner. We note that if the 4:2 compressor is implemented by 2 cascaded Full Adders like in Figure 6.7 the reduction schemes in Figure 6.6 have the same delay. Fortunately, most of state-of-the-art CMOS 4:2 compressor implementations rely on different circuitry and are faster than 2 cascaded Full Adders [365]–[368]. Given that multiplication dominated error tolerant applications exist, e.g., multimedia processing and social media [116], approximate CMOS 4:2 compressors have been proposed [367] to enable significant energy consumption and area saving. A straightforward SW 4:2 compressor implementation can be built using the SW full adder proposed in [369], which provides accurate results with acceptable delay and energy efficiency. However, as previously mentioned, many applications are error tolerant, and work properly within certain error limits [116], therefore, by enabling approximate computing, a more energy efficient SW 4:2 compressor can be made. #### **6.3.1. SPIN WAVE 4:2 COMPRESSOR STRUCTURE** Figure 6.8 presents the proposed 4:2 compressor consisting of 5 inputs X1, X2, X3, X4, and $C_i$ and 3 outputs $C_{o1} = MAJ(X1, X2, X3)$ , $C_{o2} = MAJ(XOR(XOR(X1, X2), X3), X4, C_i)$ , and $S = XOR(XOR(XOR(XOR(X1, X2), X3), X4), C_i)$ in addition to 3 intermediate cells $I_1$ , $I_2$ , and $I_3$ , which are repeaters to receive and excite the SWs with the suitable amplitude and phase. In order to ensure the correct functionality of the proposed 4:2 compressor, all SWs must be excited at the same amplitude, wavelength, and frequency. The SW wavelength must be larger than the waveguide width to simplify the interference pattern. Moreover, the structure must be designed carefully to guarantee the correct functionality of the compressor because the structure's dimension affects the interference results. For example, if constructive interference is required at the intersection point when the waves Figure 6.8: Spin Wave Based 4:2 Compressor. have the same phase and destructive interference otherwise, then the device dimensions $d_1$ , $d_2$ , $d_3$ , $d_5$ , $d_6$ , $d_7$ , and $d_8$ must equal to $n\lambda$ where n=0,1,2,... Note that this is the case in our design. The outputs $C_{o1}$ and $C_{o2}$ must be located at a specific position as they are based on phase detection. Hence, by changing their location, it is feasible to extract the inverted output or the non-inverted output. For example, if the desired result is to capture the non-inverted output, the distance $d_4$ must equal $n\lambda$ which is the case for $C_{o1}$ and $C_{o2}$ . On the other hand, as the output S is detected based on threshold detection, the resultant SW is compared with a predefined threshold value as previously discussed. To detect the largest possible SW amplitude, the outputs S, $C_{o1}$ , and $C_{o2}$ must be located as close as possible to the interference point, i.e., $d_4$ and $d_9$ must be as small as possible. The proposed 4:2 SW compressor works as follows: - The proposed 4.2 5W compressor works as follows. - Carry-out1 $C_{o1}$ : The SWs excited at X2 and X3 interfere constructively or destructively depending on their phase at the connection point. Then the SW interference result propagates further through the waveguide and interferes with the SW excited at X1 at the next intersection point between the waveguides. Finally, the resultant SW is captured at $C_{o1}$ based on phase detection. - Carry-out2 $C_{o2}$ : The SWs excited at X2 and X3 interfere constructively or destructively depending on their phase at the connection point. After that, the resultant wave is received by repeater $I_1$ which will excite a SW with a suitable phase depending on the received SW magnetization. If the received SW magnetization is larger than a threshold, a SW with phase of 0 will be excited, and a SW with phase of $\pi$ will be excited, otherwise. Then, the SW excited from I1 interferes with the SW excited from X3. Next, the resultant SW will be received by the repeater I2 which will excite a SW with a suitable phase depending on the received SW magnetization at the next intersection point between the waveguides. Meanwhile, the SWs excited from X4 and $C_i$ will interfere at the next connection point. Finally, the resultant SW will interfere with the SW excited from I2, and the result will be captured by $C_{o2}$ based on phase detection. - Sum *S*: The SWs excited from X4 and $C_i$ will interfere at the intersection point between the two waveguides, and the result will be detected by repeater $I_3$ . Next, Figure 6.9: Normalized 4:2 Compressor Carry-out $C_{01}$ . repeater $I_3$ will excite a SW with a suitable phase depending on the received SW magnetization as previously discussed. Finally, the output S will capture the results of the interference between SWs excited from $I_2$ and $I_3$ based on threshold detection. #### **6.3.2.** SIMULATION SETUP AND RESULTS We utilize the aforementioned parameters in SubSection 6.2.2 to validate the proposed structure by means of MuMax3 [195]. We excite the SWs with a 10 GHz pulse with sigma of 500 ps to save energy, guarantee a single frequency SW excitation, and achieve high group velocity. The wavenumber k is determined from the SW dispersion relation, which makes the wavelength equals to $\lambda = 2\pi/k = 170$ nm. As discussed, the distances $d_1$ , $d_2$ , $d_3$ , $d_6$ , $d_7$ , and $d_8$ equal to $n\lambda$ , and thus, the distances are determined to be: $d_1 = 340$ nm (n=2), $d_2 = 850$ nm (n=5), $d_3 = 680$ nm (n=4), $d_4 = 170$ nm (n=1), $d_5 = 50$ nm, $d_6 = 340$ nm (n=2), $d_7 = 340$ nm (n=2), $d_8 = 1020$ nm (n=6) and $d_9 = 50$ nm. Figure 6.9 presents the proposed compressor carry-out1 $C_{o1}$ MuMax3 simulation results for $\{X_1, X_2, X_3\} = \{0,0,0\}$ , $\{0,0,0\}$ , $\{0,0,1\}$ , $\{0,1,0\}$ , $\{0,1,1\}$ , $\{1,0,0\}$ , $\{1,0,1\}$ , $\{1,1,0\}$ , and $\{1,1,1\}$ , respectively. Inspecting the figure, the $C_{o1}$ is captured correctly based on phase detection. For example, $C_{o1} = 0$ for $\{I_1, I_2, I_3\} = \{0,0,0\}$ , $\{0,0,1\}$ , $\{0,1,0\}$ , and $\{1,0,0\}$ , whereas $C_{o1} = 1$ for $\{X_1, X_2, X_3\} = \{0,1,1\}$ , $\{1,0,1\}$ , $\{1,1,0\}$ , and $\{1,1,1\}$ at time=2.25 ns. Table 6.4 presents the normalized magnetization of the SW received by the repeater $I_1$ and the SW excited by $I_1$ for different input combinations $\{X_2, X_3\} = \{0,0\}$ , $\{0,1\}$ , $\{1,0\}$ , and $\{1,1\}$ , respectively. Note that the threshold technique is used to detect and excite the SW at $I_1$ such that if the SW magnetization is larger than the threshold, $I_1$ excites a SW with $\phi = 0$ , whereas otherwise, $I_1$ excites a SW with $\phi = \pi$ . The threshold is calculated by averaging the two nearest cases, *i.e.* $\{X_2, X_3\} = \{1,0\}$ , $\{1,1\}$ , resulting in 0.585 for this case. Inspecting the table, we can see that the SW magnetization received by $I_1$ is larger than | Inputs (X2X3) | Normalized SW Magnetization received by $I1$ | SW excited by 11 | |---------------|----------------------------------------------|----------------------| | 00 | 1 | SW with $\phi = 0$ | | 01 | 0.18 | SW with $\phi = \pi$ | | 10 | 0.18 | SW with $\phi = \pi$ | | 11 | 0.99 | SW with $\phi = 0$ | Table 6.4: Normalized SW Magnetization at I1 Table 6.5: Normalized SW Magnetization at I2 | Inputs (X1I1) | Normalized SW Magnetization received by <i>I</i> 2 | SW excited by I2 | |---------------|----------------------------------------------------|----------------------| | 00 | 1 | SW with $\phi = 0$ | | 01 | 0.65 | SW with $\phi = \pi$ | | 10 | 0.64 | SW with $\phi = \pi$ | | 11 | 0.99 | SW with $\phi = 0$ | 0.585 for the input combinations $\{X_2, X_3\} = \{0,0\}$ , and $\{1,1\}$ , whereas the SW magnetization received by $I_1$ is less than 0.585 for the input combinations $\{X_2, X_3\} = \{0,1\}$ , and $\{1,0\}$ . The same reasoning holds for $I_2$ for which the results are presented in Table 6.5. Here, the threshold is set to 0.82 which is the average of the two cases $\{X_2, X_3\} = \{0,1\}$ , $\{1,1\}$ . Inspecting the table, we can see that the SW magnetization received by $I_2$ is larger than 0.82 for the input combinations $\{X_1, I_1\} = \{0,0\}$ , and $\{1,1\}$ , whereas the SW magnetization received by $I_2$ is less than 0.82 for the input combinations $\{X_1, I_1\} = \{0,1\}$ , and $\{1,0\}$ . After that, the same results are obtained for $C_{o2}$ which is detected based on phase detection as $C_{o1}$ , and $C_{o2}$ which are detected based on threshold detection as $C_{o2}$ with the same analysis. Therefore, the micromagnetic simulation results demonstrate that the 4:2 SW compressor is functioning correctly. #### **6.3.3.** Performance Evaluation In order to assess the performance of the proposed 4:2 SW compressor and see the potential of such an approach, we evaluate it and compare it with the state-of-the-art 22 nm CMOS [370], Magnetic Tunnel Junction (MTJ) [362], Domain Wall Motion (DWM) [362], and Spin-CMOS [362] technologies in terms of energy, delay, and area. We have made the following assumptions for the performance evaluation [352]: (i) The excitation, detection, and repeater cells are Magnetoelectric (ME) cells, and their power consumption, and delay are 34 nW, and 0.42 ns, respectively. (ii) SWs do not consume noticeable energy while interfering with each other or propagating in the waveguide. Note that these assumptions might need re-evaluation in the near future as SW technology is still in its infancy stage. Table 6.6 presents the performance evaluation of the proposed compressor, and the comparison with the state-of-the-art. As it can be observed from the Table, the proposed SW compressor consumes 2.5x less energy than the 22 nm CMOS counterpart while requiring 119x more delay [370]. In addition, the proposed SW compressor consumes at least 3 orders of magnitude less energy than the MTJ, DWM, and Spin-CMOS counter- | Design | Technology | Energy (fJ) | Delay (ns) | Device No. | |-------------|------------|-------------|------------|------------| | [370] | CMOS | 0.4 | 0.048 | 38 | | [362] | MTJ | 85680 | 20.4 | 76 | | [362] | DWM | 630 | 3.7 | 58 | | [362] | Spin-CMOS | 667 | 6 | 68 | | Proposed SW | Spin Wave | 0.16 | 4.68 | 11 | Table 6.6: 4:2 Compressor Performance Comparison parts, while requiring 1.84x, and 1.26 more delay, and 1.28x less delay than the MTJ, DWM, and Spin-CMOS counterparts, respectively [362]. Moreover, the proposed compressor requires the least number of devices in comparison with the other designs as can be seen in Table 6.6. Note that the SW delay can be improved by using other materials which have higher group velocity. To get some insight in the implications of our proposal at the application level, we consider the well-known JPEG encoding, which relies on the Discrete Cosine Transform (DCT) [371], as discussion vehicle. Given that JPEG encoding is error tolerant and DCT is a multiplication dominated algorithm, 4:2 compressors based tree multipliers are quite attractive for practical JPEG code implementations. Such an approach has been presented in [362], and briefly, it is as follows: DCT and Inverse DCT (IDCT) are implemented by means of accurate adders and compressor-based multipliers, thus additions provide accurate results and multiplications results. The $16 \times 16$ signed multipliers implementations are based on the Baugh-Wooley algorithm and Dadda [364] partial product reduction implemented with 4:2 counters. As we discussed previously, the proposed compressor consumes 3 magnitude orders less energy than the Spin-CMOS counterpart which indicates that the DCT/IDCT based on the proposed 4:2 SW compressor will consume at least 3 orders of magnitude less energy than the DCT/IDCT based on the Spin-CMOS 4:2 SW compressor [362]. ## **6.4.** GATE CASCADING IN SW WITHOUT DOMAIN CONVERSION In this section, we will discuss the magic device able to let A amplitude SWs pass unchanged and throw away two thirds of the energy of 3A amplitude SWs. In addition, we illustrate the construction of cascaded gates and circuits using the proposed solution. Furthermore, we explain the simulation platform, the performed simulations results, and performance evaluation. #### **6.4.1.** DIRECTIONAL COUPLER BASED SW GATE CASCADING The magic device, which is able to normalize SW when it is necessary, can be a directional coupler [250] that is properly designed to adjust, before passing it to the next Majority gate, the driving Majority gate output SW amplitude to A in case of strong majority (3A) or to leave it unchanged for weak majority cases. This behaviour can be achieved by making use of high amplitude SWs nonlinear properties that cause a dispersion relation shift, which at its turn induces a wavelength shift. When placing two waveguides close to each other they are said to be dipolar coupled and form a directional coupler, Figure 6.10: (a) Proposed Gates Cascading Solution. Spin Wave Waveform Analysis (b) $I_1I_2I_3I_4I_5$ =00011, (c) $I_1I_2I_3I_4I_5$ =00111. which enables a wavelength dependent energy transfer between the two waveguides as explained in details in Chapter 2. Thus, by properly controlling this energy transfer via the nonlinear characteristics, the spin wave amplitude can be normalised to the desired value, i.e., *A* in our case. The directional coupler along with its design specifications are presented and discussed thoroughly in Chapter 2 Section 2.4. Figure 6.10a revisits the situation in Figure 6.1a and augments the waveguide connecting the two majority gates with a directional coupler as amplitude normalizer. The spin waves excited at $I_1$ , $I_2$ , $I_3$ interfere constructively or destructively depending on their phases and the output of the first MAJ3 gate is normalized or not on case it signals a strong or a weak majority by the directional coupler. If the output SW amplitude is greater than a predefined threshold, in our case the inputs amplitude value A, then it is normalized to A while preserving the SW phase. Otherwise, no normalization occurs and only a tinny portion of the SW power is transferred to the second waveguide due to the coupling effect. The two input combinations we previously utilized explain the gate cas- cading issue, i.e., $I_1I_2I_3I_4I_5$ =00011 and $I_1I_2I_3I_4I_5$ =00111, are revisited to demonstrate that the directional coupler enables proper gate cascading. Assuming that all input spin waves are excited with the same amplitude A and frequency ones excited at $I_1$ , $I_2$ , and $I_3$ interfere constructively in the first case resulting in a spin wave with 0 phase and 3A amplitude as depicted by WG D BN in Figure 6.10b. Given that SW amplitude is greater than A it is normalized by the directional coupler to A producing WG D AN in Figure 6.10b. At the second majority gate WG E and WG F SWs interfere constructively which result destructively interfere with WG D AN SW. As a result of the overall interference process the output SW corresponds to a logic 1 as it should. In the other case, $I_1$ SW constructively interferes with $I_2$ SW which result destructively interferes with $I_3$ SW resulting in a spin wave with 0 phase and amplitude A in WG D BN. Since the amplitude equals to the threshold, no normalization occurs and the WG D AN spin wave approximately equals WG D BN SW as depicted in Figure 6.10c. Then the spin wave excited at $I_4$ and $I_5$ interfere constructively with each other and destructively with spin wave in WG D AN, which result in a $\pi$ phase and amplitude A SW, i.e., a logic 1 as expected. Note that the above holds true for all logic gate types, i.e., (N)AND, and (N)OR, and the proposed solution can be utilized to normalize the output of these gates if cascaded with other gates. In order to validate our proposal and demonstrate its potential towards building spin wave circuits, we design three complex gates that make use of it. While most of the time, circuit design requires the utilization of one gate output as input for only one followup gate there are situations when that output has to drive more than one gate input. To cover the most common situations encountered in logic circuit implementations we selected three different structures for demonstration purpose, as follows: (i) Single output MAJ3 gate and (ii)Fully/Partially cascadable dual output MAJ3 gates. While the first structure (Figure 6.11) can provide only one output, the second (Figure 6.12) and third structure (Figure 6.13) can provide two outputs. In addition, the three inputs in the first structure have similar contribution approximately to the output which is not the case in the second and third structures which might result in the excitation of different inputs at different energy levels in the second and third structures. Note that the introduced approach is scalable and can be applied to SW gates with more outputs. Further, the proposed structures can mimic (N)AND, (N)OR, and X(N)OR gate behavior as indicated in Chapter 3. Additionally, in order to assess the cascading approach potential at circuit level we instantiate a 2-bit inputs spin wave multiplier presented in Figure 6.14, which spin wave domain only design is not possible without the proposed approach. #### **6.4.2.** CASCADED IN-LINE MAI3 GATES The structure in Figure 6.10a provides a generic gate cascading solution containing multiple bent regions, which are not SW propagation "friendly". To minimize them, we implemented the two in-line majority cascaded gates compound with one bent region as depicted in Figure 6.11. Note that the normalized output of the first Majority gate acts as the third input of he second Majority gate. To guarantee proper results, the structure dimensions must be fulfilled certain constraints as follows. If SWs should constructively interfere when they have the same phase and destructively otherwise, $d_1 = d_2 = ... = d_5 = n \times \lambda$ , where n = 0, 1, 2, 3, ... If the op- Figure 6.11: In-Line MAJ3 Cascaded Gates. posite behaviour is desired, i.e., SWs constructively interfere if they are out of phase and destructively otherwise, $d_1 = d_2 = ... = d_5 = (n + \frac{1}{2}) \times \lambda$ . The output of the first Majority gate must be normalized to the amplitude of the second Majority gate inputs. Assuming that all input SWs have an amplitude of A the output of the first Majority gate must be normalized to A in case it reports a strong majority result, i.e., a 3A amplitude SW. Therefore, if the output amplitude is A no normalization is required, whereas if the output amplitude is 3A a normalization is performed such that 66% of the spin wave power moves into the second waveguide towards X and only 33% of it passes to the second Majority gate. To obtain this bahaviour, the directional coupler is designed by making use of equations presented in Chapter 2 Section 2.4 while taking into consideration different parameters including applied magnetic field, spaces between waveguides, dimension of the waveguides, static magnetization orientation, and spin wave wavelength, frequency, and amplitude. The output position must be determined accurately to obtain the desired results, i.e., MAJ3 and inverted MAJ3 are obtained when $d_6 = n \times \lambda$ and $d_6 = (n + \frac{1}{2}) \times \lambda$ , respectively. Moreover, depending on a predefined phase, the output value can be phase detected, i.e., $\Delta \phi = 0$ represents logic 0 and $\Delta \phi = \pi$ logic 1. ## **6.4.3.** FULLY CASCADED LADDER MAJ3 GATES As the efficient implementation of real life circuits requires gates with fanout capabilities a fanout of 2 ladder shaped MAJ3 gate has been introduced in Chapter 3. Before discussing the augmentation of such a gate with directional couplers we briefly discuss its operation principle. The upper part of the structure presented in Figure 6.12 constitutes a MAJ3 gate that is able to parallelly evaluate $MAJ(I_1,I_2,I_3)$ and $MAJ(I_1,I_2,I_4)$ , thus if $I_3=I_4$ the two values are equal and the gate exhibits a fanout of 2. As discussed in [77] the waveguide topology and dimensions are determined in such a way that the input SWs can properly interfere and generate the correct output values, according with the Majority function true table, and the SW present in the left/right arm before the directional coupler carries the $MAJ(I_1,I_2,I_3)/MAJ(I_1,I_2,I_4)$ value. Simply speaking, the MAJ3 gate operates as follows: (i) At $I_1$ , $I_2$ , $I_3$ , and $I_4$ , SWs are excited with suitable phase, i.e., phase 0 for logic 0 and phase $\pi$ for logic 1, (ii) Excited SWs propagate through the horizontal and vertical waveguides, (iii) At the "meeting" points, they interfere constructively or destructively depending on their phases, and (iv) Finally, the resultant SWs propagate down- Figure 6.12: Fully Cascaded Ladder MAJ3 Gates. wards through the left and right arms. Note that while the ladder structure is meant to compute a Majority function can also evaluate basic Boolean functions. If output based phase detection is in place, which means that the output phase is compared with a predefined phase and $0/\pi$ phase difference means logic 0/1, $(N)AND = MAJ(I_1, I_2, 0)$ and $(N)OR = MAJ(I_1, I_2, 1)$ . In contrast, if threshold detection is utilized such that if the output spin wave magnetization is greater than a predefined threshold, the output is logic 1, and it is logic 0, otherwise, then $XOR = MAJ(I_1, I_2, 0)$ . To make the FO2 MAJ3 gate outputs directly connectable as inputs to following SW gates they have to be normalized by means of 2 directional couplers as presented in Figure 6.12. The circuit in the Figure operates as follows: (i) At $I_1$ , $I_2$ , $I_3$ , $I_4$ , $I_5$ , and $I_6$ , SWs are excited with suitable phase, (ii) The excited spin waves propagate horizontally and vertically and at the intersection point, they interfere constructively or destructively depending on the excited SWs phases in both arms, (iii) The resulted spin waves from the first Majority gate propagate toward the couplers to be normalized, (iv) The normalized SWs propagate downward to interfere with the spin waves excited at $I_5$ and $I_6$ , and (v) Finally, the resulted SWs propagate toward $O_1$ and $O_2$ such that $O_1 = MAJ(MAJ(I_1, I_2, I_3), I_5, I_6)$ and $O_2 = MAJ(MAJ(I_1, I_2, I_4), I_5, I_6)$ and that $I_3 = I_4$ . Note that in case $I_3 = I_4$ the two outputs are equal, thus the gate compound exhibits a fanout of 2, but when $I_3 \neq I_4$ the circuit evaluates two different functions that benefit circuit complexity. Figure 6.13: Partially Cascaded Ladder MAJ3 Gates. To guaranty correct behaviour the input SWs must have the same amplitude and wavelength $\lambda$ , which, to simplify the interference pattern, must be greater than the waveguide width w. The structure dimension $d_i$ , $i=1,2,\ldots,6$ must be determined in terms of $\lambda$ . For instance, if SWs have to constructively interfere when they have the same phase and destructively interfere when they are out of phase, $d_1, d_2, \ldots, d_6$ must be equal with $n\lambda$ , where $n=1,2,3,\ldots$ However, if the other way around is desired, i.e., SWs with the same phase should interfere destructively and constructively when they are out of phase, $d_1, d_2, \ldots, d_6$ must be equal with $(n+\frac{1}{2})\lambda$ , where $n=1,2,3,\ldots$ Additionally, the outputs can be captured at $O_1$ and $O_2$ located at $d_7$ and $d_8$ from the last interference point, which should be $n\lambda$ or $(n+\frac{1}{2})\lambda$ if the non-inverted or inverted output is desired, respectively. Note that the couplers which are needed to normalize the outputs of the first Majority gates are designed in same way as described in Chapter 2 Section 2.4. #### **6.4.4.** Partially Cascaded Ladder MAJ3 Gates In this situation the FO2 MAJ3 gate is providing input to one follow up MAJ3 gate while its second output constitutes a circuit primary output, i.e., it is read out by a SW detection cell. Consequently, only one directional coupler is required as depicted in Figure 6.13, while the operation principle and the design stpng are the same as for the previously Figure 6.14: 2-bit Inputs Spin Wave Multiplier. discussed structures. #### **6.4.5.** 2-BIT INPUTS SPIN WAVE MULTIPLIER Figure 6.14 presents a 2-bit inputs SW multiplier that makes use of the proposed normaliser. The multiplier inputs are the operands $X=(X_1,X_0)$ and $Y=(Y_1,Y_0)$ and the control signals $C_1$ and $C_2$ . The structure requires 18 excitation cells and generates a 4-bit output $Q=(Q_0,Q_1,Q_2,Q_3)$ . Following the multiplication algorithm $Q_0=AND(X_0,Y_0)$ and $Q_1=XOR(AND(X_1,Y_0),AND(X_0,Y_1))$ and as depicted in Figure 6.14, the two AND gate outputs are normalized by 2 directional couplers to enable their cascading such that the XOR gate can correctly and detect $Q_1$ . Further, $Q_2=XOR(AND(X_1,Y_1),AND(X_0,Y_0,X_1,Y_1))$ , and again 2 directional couplers are required to normalize the outputs of the $AND(X_0,Y_0,X_1,Y_1)$ and $AND(X_0,Y_0)$ and enable their cascading such that the followup XOR gate can correctly evaluate and detect $Q_2$ . Finally, $Q_3=AND(X_0,Y_0,X_1,Y_1)$ as it can be observed in Figure 6.14. As previously discussed, the distances depend of the chosen SW wavelength and must be accurately determined, i.e., $d_i = n\lambda$ , where $i \in \{1, 2, ..., 35\}$ , n = 0, 1, 2, ... and $n \neq \{5, 16, 33, 35\}$ as the required interference has to interfere constructively if the SWs have the same phase, and destructively if they are out of phase $\Delta \phi = \pi$ . Moreover, as the circuit includes AND and XOR gates, phased based detection is re- 2.282 GHz | Parameters | Values | |-----------------------------|-----------------------| | Magnetic saturation $M_s$ | $1.4 \times 10^5$ A/m | | Damping constant $\alpha$ | 0.0002 | | Waveguide thickness t | 30 nm | | Exchange stiffness $A_{ex}$ | 3.5 pJ/m | | $L_w$ | $3 \mu \mathrm{m}$ | | DW | 8 nm | | λ | 340 nm | Frequency f Table 6.7: Simulation Parameters quired for $Q_0$ and $Q_3$ and threshold based detection for $Q_1$ and $Q_2$ . To ensure correct output detection $d_5$ and $d_{35}$ must be $n\lambda$ to read the non-inverted output. In contrast, $Q_1$ and $Q_2$ can be located at different locations as it is read based on thresholding. In addition, all outputs should be located as near as possible to the interference point to minimize SW amplitude attenuation. #### **6.4.6.** SIMULATION SETUP AND RESULTS We make use of OOMMF [194] and MuMax3 [195] to validate the correct functionality of the proposed normalization solution and gate cascading structures. In the simulations, blue represents a logic 1 and red a logic 0. The parameters provided to the micromagnetic software are presented in Table 6.7 [250]. The dimension of the structures is equal to a spin wave wavelength multiple. Therefore, dimension of the structure in Figure 6.11 are $d_1=d_2=d_4=340$ nm, $d_3=3.74$ $\mu$ m, $d_5=4.08$ $\mu$ m, and $d_6$ =340 nm, whereas the dimension of the structure in Figure 6.12 and 6.13 are $d_1$ = $d_2 = d_3 = d_4 = d_5 = d_6 = d_7 = d_8 = 340 \,\mathrm{nm}$ and $d_1 = d_2 = d_3 = d_4 = d_5 = d_6 = d_7 = d_8 = d_9 = 340 \,\mathrm{nm}$ . Moreover, as further discussed afterwards, when making use of a YIG waveguide the directional coupler induced delay is 150 ns, which can be decreased by scaling down the structure or by utilizing another material with higher spin wave group velocity. In this work, $Fe_{60}Co_{20}B_{20}$ was utilized as waveguide material. The material parameters are: magnetic saturation $M_s=1.1\times10^6$ A/m, exchange stiffness $A_{ex}=18.5$ pJ/m, damping constant $\alpha = 2 \times 10^{-4}$ , and perpendicular anisotropy constant $k_{ani} = 8.3177 \times 10^{5} \text{J/m}^{3}$ [346]. The waveguide width is 30 nm and its thickness 1 nm. SWs are excited at a frequency of 15 GHz and have a wavelength of 100 nm. In addition, as the waveguide length should be equal to a wavelength multiple we have chosen it to be 5 times the wavelength, i.e., 500 nm, to decrease mutual effects of gate arms and directional couplers on each others. By making use of equations presented in Chapter 2 Section 2.4, we determined the directional coupler dimensions as $L_w$ =2.55 $\mu$ m and DW=8 nm. Delay, power, and energy consumption are metrics of interest to evaluate the gate cascading structures and the multiplier. The transducers' energy and delay are based on the estimation in [348] and the SW delay through waveguides was estimated directly from OOMMF and MuMax3 simulation results. The following assumptions are made: i) The excitation and detection cells are ME cell, i.e., $C_{ME}$ =1 fF, $V_{ME}$ =119 mV, Energy= $k \times 10^{-1}$ Figure 6.15: Cascaded In-line MAJ3 Gates: (a) $I_1I_2I_3I_4I_5 = 00000$ , (b) $I_1I_2I_3I_4I_5 = 00111$ , and (c) $I_1I_2I_3I_4I_5 = 00011$ . $C_{ME} \times V_{ME}^2$ (where k is the number of excitation cells), and 0.42 ns ME cell switching delay [348], ii) SW consumes tiny energy in the waveguide and directional coupler when compared to the energy consumed by the transducers, and iii) SWs are excited by means of pulse signals. We note that due to the early stage development of the SW technology, these assumptions might not be accurate and the assumed values may change in the close future. Figure 6.15 (a), (b), and (c) presents the simulation results of the two MAJ3 inline cascaded gates (see Figure 6.11 for the input patterns $I_1I_2I_3I_4I_5 = 00000$ , $I_1I_2I_3I_4I_5 = 00111$ , and $I_1I_2I_3I_4I_5 = 00011$ , respectively). By inspecting the Figures, it is clear the output results are as expected, i.e., the output corresponding to $I_1I_2I_3I_4I_5 = 00000$ is logic 0 because all inputs are logic 0 and logic 1 in the other cases because two inputs of the second Majority gate are logic 1 and one input is logic 0, due to the proper amplitude correction induced by the directional coupler. Figure 6.16 (a), (b), and (c) presents the MuMax3 simulation results for the structure in Figure 6.12 corresponding to 2 fully cascaded ladder MAJ3 gates for the input combinations $I_1I_2I_3I_4I_5I_6=000000$ , $I_1I_2I_3I_4I_5I_6=001111$ , and $I_1I_2I_3I_4I_5I_6=000011$ , respectively. It is clear from the Figure that the outputs $O_1$ and $O_2$ are correct, i.e., $O_1=O_2=0$ when $I_1I_2I_3I_4I_5I_6=00000$ because all circuit inputs are logic 0, while $O_1=O_2=1$ when $I_1I_2I_3I_4I_5I_6=001111$ and $I_1I_2I_3I_4I_5I_6=000011$ because two inputs of the second MAJ3 gate are logic 1 and the other logic 0, which demonstrates the correct behaviour of the circuit. Figure 6.17 (a), (b), and (c) presents the MuMax3 simulation results for the structure in Figure 6.13 corresponding to the partial cascading of 2 ladder MAJ3 gates for the input combinations $I_1I_2I_3I_4I_5I_6 = 000000$ , $I_1I_2I_3I_4I_5I_6 = 001111$ , and $I_1I_2I_3I_4I_5I_6 = 000011$ , respectively. By inspecting the figures, it is clear that all cases $O_1$ assumes the correct value (for $I_1I_2I_3I_4I_5I_6 = 00000$ is logic 0 because all inputs are logic 0 and logic 1 in the other cases because two inputs of the second MAJ3 gate are logic 1 and the third one logic 0). On the other hand, the second arm, which is not cascaded with the second MAJ3 gate, $O_2$ is not normalized and correct results are obtained $O_2$ (logic 0 in all cases as $I_5$ and $I_6$ do not affect its behaviour). Figure 6.16: Fully Cascaded Ladder MAJ3 Gates: (a) $I_1I_2I_3I_4I_5 = 00000$ , (b) $I_1I_2I_3I_4I_5 = 00111$ , and (c) $I_1I_2I_3I_4I_5 = 00011$ . The 2-bit inputs spin wave multiplier in Figure 6.14 is validated by MuMax3 using the same parameters as for the 30nm wide $Fe_{60}Co_{20}B_{20}$ waveguide. Figure 6.18 presents the first output $Q_0$ simulation results. Note that $Q_0 = AND(X_0, Y_0) = MAJ(0, X_0, Y_0)$ thus $C_1$ in Figure 6.14 should be asserted to 0. Inspecting Figure 6.18 reveals $Q_0$ 's correct behaviour. Note that $Q_0$ is placed at d5 = 510nm (n = 5). As $Q_1$ and $Q_2$ are computed as XOR functions threshold detection is required to determine their values and as such Table 6.8 presents $Q_1$ and $Q_2$ normalized spin wave magnetization for different inputs combinations $X_0Y_0X_1Y_1=0000$ , $X_0Y_0X_1Y_1=0001$ , ..., and $X_0Y_0X_1Y_1=1111$ . Note that to achieve proper circuit functionality $C_2$ SW amplitude has to be higher that the one of input SW by a factor of 2.25, which is the required value the realization of the 4-input AND over the input bits. In order to implement the threshold detection, an appropriate threshold is determined for each output, i.e., the normalized threshold for $Q_1$ is 0.42, and for $Q_2$ is 0.315. As presented in the Table, as the inputs combinations $X_0Y_0X_1Y_1=0000$ , $X_0Y_0X_1Y_1=0001$ , $X_0Y_0X_1Y_1=0011$ , $X_0Y_0X_1Y_1=0100$ , $X_0Y_0X_1Y_1=1100$ , $X_0Y_0X_1Y_1=1100$ , and $X_0Y_0X_1Y_1=1111$ results in output magnetization less than Figure 6.17: Partially Cascaded Ladder MAJ3 Gates: (a) $I_1I_2I_3I_4I_5 = 00000$ , (b) $I_1I_2I_3I_4I_5 = 00111$ , and (c) $I_1I_2I_3I_4I_5 = 00011$ . the threshold, thus $Q_1=0$ , and $Q_1=1$ for $X_0Y_0X_1Y_1=0110$ , $X_0Y_0X_1Y_1=0111$ , $X_0Y_0X_1Y_1=1110$ , $X_0Y_0X_1Y_1=1001$ , $X_0Y_0X_1Y_1=1001$ , $X_0Y_0X_1Y_1=1001$ , and $X_0Y_0X_1Y_1=1101$ because these input combinations result in output spin wave amplitudes larger than the threshold. Also, as the inputs combinations $X_0Y_0X_1Y_1=0011$ , $X_0Y_0X_1Y_1=0111$ , and $X_0Y_0X_1Y_1=1011$ result in output magnetization greater than the threshold, thus $Q_2=1$ , and $Q_2=0$ for the rest cases. Note that the normalized thresholds average for $Q_1$ and $Q_2$ are obtained by averaging the normalized magnetization for $Q_1$ and $Q_2$ between inputs 0001 and 1001 for $Q_1$ and inputs 1011 and 0101 for $Q_2$ . Note that the main reason of the quasi-continuous distribution of $Q_1$ is that the normalization is not occurring as ideally wanted because there will be some SW energy transfer to the second waveguide even if no normalization is required. Relying on different coupling effect like exchange coupling might improve the performance and make the design more reliable. Figure 6.19 presents the forth output $Q_3$ simulation results for $X_0Y_0X_1Y_1 = 0000$ , $X_0Y_0X_1Y_1 = 0001$ , ..., and $X_0Y_0X_1Y_1 = 1111$ . As it can be observed in the Figure $Q_3$ , which is $AND(X_0, Y_0, X_1, Y_1)$ , is correctly evaluated. Whereas normalization based cascading does not consume a noticeable amount of energy, in comparison with transducers based counterpart (no ME cells for domain conversion are required and the electrons are not moving but just spin and affect each other by the dipolar coupling effect), it induces a significant delay overhead. To estimate the delay, i.e., the maximum time it takes for the SW outputs to become available for further Figure 6.18: $Q_0$ Output Simulation (a) $X_0Y_0 = 00$ , (b) $X_0Y_0 = 01$ , (c) $X_0Y_0 = 10$ , and (d) $X_0Y_0 = 11$ . Figure 6.19: Fourth Spin Wave Multiplier Output (a) X1Y1X0Y0 = 0000, (b) X1Y1X0Y0 = 0001, and (p) X1Y1X0Y0 = 1111. processing, we make use of the numerical simulation results and for all YIG waveguides based considered structures we computed a coupler induced delay of 150 ns. Although this delay overhead is rather large, it can be decreased by structure down-scaling and by relying on alternative materials with higher SW group velocity. Additionally, a promising method to decrease the delay is by utilizing another coupling effect than the dipolar one, which is slow by its nature. The potential utilization of exchange coupling, which is significantly faster is currently under investigation. To get an indication on the scaling effect, we validated by means of MuMax3 simulations the cascading of FO2 MAJ3 gates constructed with $Fe_{60}Co_{20}B_{20}$ waveguides of 30 nm width. Simulation results for $I_1I_2I_3I_4I_5I_6 = 000000$ , $I_1I_2I_3I_4I_5I_6 = 001111$ , and $I_1I_2I_3I_4I_5I_6 = 000011$ are presented in Figure 6.20, and one can easily check that the output values are correct. Remarkable is the fact that scaling and material change diminished the delay overhead from 150 ns to 20 ns as the SW group velocity is faster in the other material and the structure becomes smaller, which indicates that the overhead can potentially be further | | Cases | | | $Q_1$ | $Q_2$ | |------------|------------|------------|------------|-------|--------| | <i>X</i> 1 | <i>Y</i> 1 | <i>X</i> 0 | <i>Y</i> 0 | | | | 0 | 0 | 0 | 0 | 0.03 | 0.06 | | 0 | 0 | 0 | 1 | 0.08 | 0.03 | | 0 | 0 | 1 | 0 | 0.22 | 0.016 | | 0 | 0 | 1 | 1 | 0.15 | 0.04 | | 0 | 1 | 0 | 0 | 0.38 | 0.17 | | 0 | 1 | 0 | 1 | 0.03 | 0.3 | | 0 | 1 | 1 | 0 | 0.46 | 0.09 | | 0 | 1 | 1 | 1 | 0.74 | 0.09 | | 1 | 0 | 0 | 0 | 0.32 | 0.3 | | 1 | 0 | 0 | 1 | 1 | 0.16 | | 1 | 0 | 1 | 0 | 0.1 | 0.006 | | 1 | 0 | 1 | 1 | 0.54 | 0.0003 | | 1 | 1 | 0 | 0 | 0.002 | 1 | | 1 | 1 | 0 | 1 | 0.52 | 0.7 | | 1 | 1 | 1 | 0 | 0.52 | 0.33 | | 1 | 1 | 1 | 1 | 0.22 | 0.2 | Table 6.8: Normalized Second and Third Spin Wave Multiplier Outputs. decreased towards the ps range. #### **6.4.7.** Performance Evaluation In order to evaluate the practical implications of our proposal we evaluate coupler-based and conversion-based cascading and compare them in terms of delay, power, and energy consumption. The conversion-based circuits are obtained by replacing each directional coupler in Figures 6.11, 6.12, and 6.13 with two transducers able to convert SW to charge domain and back to SW domain. Given the assumptions in the previous subsection, the following conjectures are utilized in the evaluations: (i) Transducers (MEs) are the main contributor to the circuit power consumption while the power consumption related to SWs propagation trough waveguide and directional coupler is insignificant, (ii) SW propagation delay in the waveguide is neglected, (iii) ME transducer power consumption and delay are 34.3 $\mu$ W and 0.42 ps, respectively [348], and iv) SWs are excited by means of pulse signals. For delay calculations we identify the critical path length through each considered structure. As this spans over 2 ME cells and one directional coupler, and 4 ME cells for coupler and conversion based designs, respectively, the delay sums up to 20.84 ns and 1.68 ns, respectively. As SW propagation, interference, and normalization are assumed to happen at zero power costs the power consumed by each design is determined by the number of ME cells it includes. Given that conversion based designs require 8, 12, and 10 ME cells, the power sums up to 274.4 $\mu$ W, 411.6 $\mu$ W, and 343 $\mu$ W for the in-line, ladder fully, and ladder partially cascaded structures, respectively. On the other hand, coupler based structures require 6, 8, and 8 ME cells which results in 205 $\mu$ W, 274.4 $\mu$ W, and 274.4 $\mu$ W for the in- Figure 6.20: Scaled Down Fully Cascaded MAJ3 Gates at (a) $I_1I_2I_3I_4I_5 = 00000$ , (b) $I_1I_2I_3I_4I_5 = 00111$ , and (c) $I_1I_2I_3I_4I_5 = 00011$ . line, ladder fully, and ladder partially cascaded structures, respectively. Finally, the energy consumption can be derived as the power-delay product. We note, however, that due to pulse operation paradigm, ME activation follows the domino behaviour. Thus, each of them is active for a short period of time necessary for its output SW creation, i.e., assuming that the ME cell delay of 0.42 ns [348], and idle for the rest of the calculation. As the power consumed by the SW propagation through the waveguides can be neglected the overall power consumption is determined by the number of ME cells in the circuit and the ME cell power consumption. While in general the energy is computed as the overall power and circuit delay product, this is not the case for pulse mode operation as each ME cell is only active once per circuit input evaluation and for a period of time corresponding to its latency, i.e., 0.42 ns under our assumptions. In view of this, the energy consumption can be determined by multiplying the overall power consumption with the ME cell delay without considering the directional coupler delay. This means that the energy consumption is actually independent of the overall circuit delay, which nullifies the coupler delay overhead contribution to the energy consumption. Therefore, the energy for the coupler-based cascading is calculated by multiplying the total power with the delay of a single ME cell, which is 0.42 ns. By following this procedure, the energy consumed by conversion-based in-line, ladder fully, and ladder partially cascaded structures is derived as 115.2 aJ, 172.8 aJ, and 144 aJ, respectively, and 86.4 aJ, 115.2 aJ, and 115.2 aJ for the coupler-based counterparts. Table 6.9 presents the comparison of the coupler-based and conversion-based implementations in terms of power, delay, and energy consumption. In the Table, IL, LFC, | | Conversion cascading | | | Coupler cascading | | | |---------------------|----------------------|-------|------|-------------------|-------|-------| | Structure | IL | LFC | LPC | IL | LFC | LPC | | Power (µW) | 274.4 | 411.6 | 343 | 205 | 274.4 | 274.4 | | Delay<br>(ns) | 1.68 | 1.68 | 1.68 | 20.84 | 20.84 | 20.84 | | Energy <sup>1</sup> | 115.2 | 172.8 | 144 | 86.4 | 115.2 | 115.2 | Table 6.9: Comparison with cascading based conversion Table 6.10: 2-bit Input Multiplier Performance. | Technology | 16 nm CMOS | 30 nm waveguide width SW | 30 nm waveguide width SW | |----------------------------|------------|----------------------------|--------------------------| | Implementation methodology | - | Conversion-based Cascading | Coupler-based Cascading | | Energy (fJ) | 2 | 0.43 | 0.32 | | Delay (ns) | 0.1 | 1.68 | 21 | | Area (μm²) | 6 | 5 | 21 | and LPC, stand for In-Line, Ladder Fully Cascaded, Ladder Partially Cascaded structures, respectively. As expected, the coupler-based approach provides a power reduction of 1.33x, 1.5x, and 1.25x for in-line, ladder fully, and ladder partially cascaded circuits, respectively. Moreover, given that pulse SW operation is utilized the directional coupler delay overhead is not negatively affecting the energy consumption and the same savings are obtained in terms of energy. Note that the coupler-based cascading may become more delay effective by further scaling down the structure, and the utilization of other materials and/or faster coupling effects. To get more inside into the potential implications of our proposal we compare the proposed 2-bit inputs multiplier with SW conversion-based and 16 nm CMOS implementation counterparts. The CMOS implementation requires 6 AND and 2 XOR gates and its area, delay and energy consumption are estimated based on the figures reported in [349]. The SW implementation for coupler-based cascading is the one described in Figure 6.14, and the implementation for the conversion-based cascading is designed by replacing each directional coupler with two transducers to convert SW to charge domain and back. The assumptions and calculation methodology utilized for 2 MAJ3 circuits comparison are in place. Table 6.10 presents the comparison of the 3 considered 2-bit inputs multiplier implementations in terms of energy, delay, and area. As it can be observed in the Table, spin wave implementations are more energy efficient than the 16 nm CMOS counterpart, i.e., $6.25 \times$ and $4.65 \times$ less energy for coupler-based and conversion-based cascad- <sup>&</sup>lt;sup>1</sup> Due to pulse mode operation each ME is active for the time necessary for its output SW creation and idle for the rest of the calculation. Thus, regardless of the overall circuit delay, the energy is evaluated as the product of power consumption and the ME cell delay (0.42 ns). ing, respectively. Moreover, the proposed solution consumes 1.34x less energy than the approach relying on forth and back conversion between spin wave and charge domains, while having $12.5\times$ and $4\times$ larger delay and area, respectively. Although the proposed solution is much slower and requires larger area, its main strong point is the ultra-low energy consumption enabled by the directional coupler utilization. As previously mentioned, the delay can be further reduced by scaling down and the utilization of other materials and/or faster coupling effect; thus, we are still far from reaching the ultimate energy consumption reduction horizon. ## **6.5.** CONCLUSIONS In conclusion, we presented a novel energy efficient spin wave based FA in this chapter. The FA is implemented by making use of a Majority gate and 2 XOR gates. In the proposed FA, two main detection mechanisms were utilized: phase detection for the Carryout output detection and threshold detection for the Sum output detection. The correct functionality of the FA was validated by means of micromagnetic simulations and it was evaluated and compared with direct SW gate based implementation and five state-of-the-art technologies equivalent designs 22 nm CMOS, MTJ, SHE, DWM and Spin-CMOS. It was demonstrated that the proposed FA consumes 22.5%, and 43% less energy than direct SW gate based implementations and 22 nm CMOS, respectively and saves more than 3 orders of magnitude in comparison with the state-of-the-art MTJ, SHE, DWM and Spin-CMOS based FA. Also, the proposed FA needs more than 22% less area in comparison with all designs. Subsequently, we proposed and validated by means of micro-magnetic simulation a novel 4:2 Spin Wave (SW) compressor. The proposed compressor was assessed and compared with the state-of-the-art SW, 22 nm CMOS, Magnetic Tunnel Junction (MTJ), Domain Wall Motion (DWM), and Spin-CMOS technologies. The evaluation result showed that the proposed compressor consumed 2.5x less energy than 22 nm CMOS counterpart. In addition, it outperformed the MTJ, DWM, and Spin-CMOS designs by at least 3 orders of magnitude. Moreover, it consumed 1.25x less energy than the conventional SW compressor. Furthermore, it achieved the smallest chip real-estate. Finally, we introduced a directional coupler-based SW amplitude re-normalization method, which allows for conversion free energy effective gate cascading. Three complex gates, that cover the most common situations encountered in logic circuit implementations, and a 2-bit inputs spin wave multiplier have been presented and validated by means of micromagnetic simulations. Our results indicated that they are energy effective and potentially open the road towards the full utilization of SW paradigm capabilities and the development of SW only circuits. In particular, for the complex gates our method provides 20%-33% energy savings when compared with conversion based equivalent designs, and the proposed SW multiplier requires 6.25× and 31% less energy in comparison with the 16 nm CMOS and conversion-based SW counterparts, respectively, which demonstrated the energy effectiveness of our proposal and its significant contribution towards the full utilization of the SW paradigm potential and the development of SW only circuits. # SPIN WAVE APPROXIMATE COMPUTING - 1.1. SW APPROXIMATE FULL ADDER - 1.2. SW 4:2 COMPRESSOR - 1.3. SW APPROXIMATE 2-BIT INPUTS MULTIPLIER - 1.4. CONCLUSIONS All the aforementioned logic gates and circuits in the previous chapters in addition to the state-of-the-art [58], [59], [62], [77], [80]–[96], [109] have been designed to provide accurate results, whereas many current applications like multimedia processing and social media are error tolerant, and within certain bounds, are not fundamentally perturbed by computation errors [116]. Therefore, such applications can benefit from approximate computing circuits, which can save significant amounts of energy, delay, and area, while providing acceptable accuracy. In view of this, this chapter introduces novel energy efficient approximate SW-based full adder, approximate SW 4:2 compressor, and approximate 2-bit inputs multiplier This chapter content is based on the following publications: A. N. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui and S. Cotofana, *A Spin Wave-Based Approximate 4:2 Compressor: Seeking the most energy-efficient digital computing paradigm,* in IEEE Nanotechnology Magazine, vol. 16, no. 1, pp. 47-56, Feb. 2022, doi: 10.1109/MNANO.2021.3126095. **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui and S. Cotofana, *Spin Wave Based Approximate Computing*, in IEEE Transactions on Emerging Topics in Computing, doi: 10.1109/TETC.2021.3136299. Figure 7.1: Approximate Spin Wave Based FA. Table 7.1: Accurate and Approximate SW-based FA | $XYC_i$ | $C_o$ | $S_{ac}$ | $S_{ap}$ | |---------|-------|----------|----------| | 000 | 0 | 0 | 1 | | 001 | 0 | 1 | 1 | | 010 | 0 | 1 | 1 | | 011 | 1 | 0 | 0 | | 100 | 0 | 1 | 1 | | 101 | 1 | 0 | 0 | | 110 | 1 | 0 | 0 | | 111 | 1 | 1 | <u>0</u> | ### 7.1. SW APPROXIMATE FULL ADDER In this section, we will present the spin wave approximate full adder in addition to the simulation setup, results, and performance evaluation. ## 7.1.1. SW APPROXIMATE FULL ADDER STRUCTURE Figure 7.1 presents the proposed Approximate Full Adder (AFA) structure, which has 3 inputs X, Y, and $C_i$ , and 2 outputs S and $C_o$ and is a 3-input Majority gate that evaluates $S = \overline{C_o} = \overline{MAJ(X,Y,C_i)}$ as suggested in [362]. AFA generates $C_o$ without any error as it is detected as the Majority of X, Y, and $C_i$ , which is also the case in accurate FAs. On the other hand, S is detected with a 25% error rate as $S = \overline{MAJ(X,Y,C_i)}$ approximate the accurate FA Sum, which equals to $S = XOR(XOR(X,Y),C_i)$ . Table 7.1 presents FA and AFA truth tables, which clarifies that the approximate FA sum $S_{ap}$ is erroneous when all inputs are 0/1. To achieve the AFA behaviour the design in Figure 7.1 has to be properly dimen- Table 7.2: Simulation Parameters | Parameters | Values | |--------------------------------|--------------------------------| | Saturation magnetization $M_s$ | $1.1 \times 10^{6} \text{A/m}$ | | Damping constant $\alpha$ | 0.004 | | Exchange stiffness $A_{exch}$ | 18.5 pJ/m | sioned. The waveguide width must be smaller or equal to the SW wavelength $\lambda$ and SW amplitude, wavelength, and frequency must be the same at every excitation cell. Furthermore, the structure dimensions must be precisely determined because the interference pattern depends on the location and distances between different excitation and detection cells. For example, if the constructive interference pattern is desired when the SWs have the same phase $\Delta \phi = 0$ and destructive when the SWs are out-of-phase $\Delta \phi = \pi$ , $d_1$ , $d_2$ , and $d_3$ must be equal with $n\lambda$ (where $n = 0, 1, 2, 3, \ldots$ ). In addition, if the inverted Majority is of interest, which is the case for S, $d_4$ must be $(n + 1/2) \times \lambda$ and if the non-inverted output is required, which is the case for $C_0$ , $d_5$ must be $n\lambda$ . The AFA operation principle relies on a combined process of SWs propagation and interferences as follows: First, SWs are excited at X and Y and propagate diagonally until they interfere constructively or destructively depending on their phases at the connection point. Then, the resulting SW propagates and interferes constructively or destructively with the SW excited at $C_i$ at the next connection point. This interference result generates the final SW, which travels toward the outputs and $\overline{MAJ(X,Y,C_i)}$ is detected at S and $MAJ(X,Y,C_i)$ at $C_o$ . #### 7.1.2. SIMULATION SETUP AND RESULTS We make use of a 50 nm wide and 1 nm thick $Fe_{60}Co_{20}B_{20}$ waveguide and the parameters specified in Table 7.2 [346] to validate the proposed approximate design AFA by means of MuMax3 [195]. Note that no external field is applied as the shape anisotropy is strong enough to push the magnetization in the plane along the waveguide length. This configuration allows the propagation of backward volume spin waves. As previously mentioned, the SW wavelength should be larger than the waveguide width to improve the interference pattern. Therefore, a 55 nm SW wavelength was chosen. After that, the AFA dimension are determined as follows: $d_1$ =330 nm (n = 6), $d_2$ =880 nm (n = 16), $d_3$ =220 nm (n = 4), $d_4$ =80 nm, and $d_5$ =110 nm (n = 2). Last, based on the SW dispersion relation, the SW frequency for a wavenumber k=2 $\pi/\lambda$ =50 rad/ $\mu$ m was calculated to correspond to a SW frequency of 10 GHz. Figure 7.2 a) to h) present AFA MuMax3 simulation results for $\{X,Y,C_i\}$ = $\{0,0,0\}$ , $\{0,0,0\}$ , $\{0,0,1\}$ , $\{0,1,0\}$ , $\{0,1,1\}$ , $\{1,0,0\}$ , $\{1,0,1\}$ , $\{1,1,0\}$ , and $\{1,1,1\}$ , respectively. Note that blue represents logic 0 and red logic 1. One can observe in the Figure that the outputs S and $C_o$ are detected as expected. For instance, $C_o = 1$ for $\{I_1,I_2,I_3\}$ = $\{0,1,1\}$ , $\{1,0,1\}$ , $\{1,1,0\}$ , and $\{1,1,1\}$ , while $C_o = 0$ for $\{I_1,I_2,I_3\}$ = $\{0,0,0\}$ , $\{0,0,1\}$ , $\{0,1,0\}$ , and $\{1,0,0\}$ . Moreover, S is inverted $C_o = 0$ as expected. Therefore, the MuMax3 simulations proves that the proposed approximate full adder provides the expected functionality. Figure 7.2: Approximate Spin Wave Based FA MuMax3 Simulation. #### 7.1.3. PERFORMANCE EVALUATION To gain more insight into the practical implications of our proposal, we compare the AFA with the state-of-the-art accurate SW [352], 7 nm CMOS [372], SHE [360], DWM [361], accurate and approximate 45 nm CMOS [373], MTJ [359], and Spin-CMOS [362] counterparts in terms of energy, delay, and area (the number of utilized devices). To evaluate the AFA, we make use of the following assumptions: (i) Excitation and detection cells are Magnetoelectric (ME) cells which power consumption and delay are 34 nW and 0.42 ns, respectively [348]. (ii) During propagation and interference, SWs consume negligible amount of energy. (iii) Pulse signals are used to excite SWs. Note that the energy and delay of the pulse signal generation in addition to the synchronization are not taken into consideration in the energy and delay calculations because it is not yet known which transducer will be utilized to excite the spin waves. Note that due to SW technology early stage of development the aforementioned assumptions might need to be re-evaluated when it becomes more mature. The AFA delay is calculated by adding two ME cells delay to the SW propagation delay through the waveguide determined by means of micromagnetic simulation and equals to 1.84 ns. Table 7.3 presents the results of the evaluation and comparison. Inspecting the Table, it is clear that AFA outperforms state-of-the-art 7 nm CMOS [372] accurate FA by an energy reduction of approximately 6%, while exhibiting a more than 2 orders of magnitude larger delay. Furthermore, AFA saves approximately 56% and 20% energy while requiring 15x and 18x larger delay when compared with 45 nm CMOS based accurate and approximate FAs, respectively, while having the same error rate as the approximate FA in [373]. When compared with other emerging technologies-based designs, | Technology | Туре | Error Rate | Energy (fJ) | Delay (ns) | Device No. | |-----------------|-------------|------------|-------------|------------|------------| | CMOS [372] | Accurate | 0 | 0.066 | 0.005 | 28 | | CMOS [373] | Accurate | 0 | 0.14 | 0.12 | 24 | | CMOS [373] | Approximate | 0.25 | 0.077 | 0.1 | 14 | | MTJ [359] | Accurate | 0 | 5685 | 3.019 | 29 | | MTJ [359] | Approximate | 0.5 | 5109 | 3.016 | 25 | | MTJ [359] | Approximate | 0.5 | 2471 | 3.152 | 29 | | SHE [360] | Accurate | 0 | 4970 | 7 | 26 | | DWM [361] | Accurate | 0 | 74.5 | 0.877 | 26 | | Spin CMOS [362] | Accurate | 0 | 166.7 | 3 | 34 | | Spin CMOS [362] | Approximate | 0.25 | 58 | 2 | 34 | | Spin Wave [352] | Accurate | 0 | 0.1 | 2.86 | 7 | | Spin Wave | Approximate | 0.25 | 0.062 | 1.84 | 5 | Table 7.3: Full Adder Performance Comparison AFA consumes 5 orders of magnitude less energy than MTJ based accurate and approximate FAs while exhibiting 42% lower delay and having 50% better error rate than the MTJ approximate FA in [359]. Moreover, AFA consumes 5 and 3 orders of magnitude less energy than SHE- and DWM- based accurate FAs, respectively, has 3.8x lower and 52% more delay than SHE [360] and DWM [361] based FAs, respectively. Furthermore, AFA consumes approximately 4 and 3 orders of magnitude less energy while providing 38% and 8% lower delay in comparison with the accurate and approximate Spin-CMOS based FAs, respectively, while having the same error rate as the approximate FA in [362]. Last but not least, AFA outperforms the SW based accurate FA [352] by 38% and 35% in terms of energy and delay, respectively. Note that, as a chip real-estate estimation, the proposed approximate FA requires the lowest number of devices. ## 7.2. SW 4:2 COMPRESSOR As stated in the previous chapter, many applications rely heavily on multiplications which make the availability of fast multipliers, which usually have 3 stages, important. In addition, the n to 2 reduction multiplier's stage has been traditionally done by means of Full and Half adders but n:2 compressors based reduction trees can be shallower and have a more regular layout [364]. Based on that, we developed a SW 4:2 compressor which was built using the proposed SW full adder, which provides accurate results with acceptable delay and energy efficiency. However, as previously mentioned, many applications are error tolerant, and work properly within certain error limits [116]. Therefore, by enabling approximate computing, a more energy efficient SW 4:2 compressor can be made. #### 7.2.1. SW APPROXIMATE 4:2 COMPRESSOR STRUCTURE The straightforward implementation of a SW approximate 4:2 compressor can be done by means of the two approximate SW full adder proposed in the previous section. This requires the cascading of two Full Adders (FAs), which cannot be performed straightforward because different FA input combinations generate different output SW strengths (see Chapter 6 Section 6.4). To solve this issue, and make the compressor functions cor- Figure 7.3: Approximate Spin Wave Based FA with Normalizer. rectly, a directional coupler is required to normalize the output of the first FA before passing it to the second FA. Figure 7.3 presents the approximate compressor obtained by cascading two approximate FAs by means of a normalizer (directional coupler). However, the directional coupler induces substantial delay and area overheads, which makes working without it desirable. Therefore, we propose the novel directional coupler free approximate compressor depicted in Figure 7.4. The behaviour of the 2 directly cascaded FAs is now obtained with a 3-input Majority gate and a 5-input Majority gate computing $C_{o1} = MAJ(X, Y, C_i)$ , and $S = \overline{C_{o2}} = MAJ(I_1, I_2, I_3, \overline{I_4}, \overline{C_{in}})$ , respectively. where $I_1, I_2, I_3, I_4$ , and $C_{in}$ are the excitation cells, and $C_{o1}$ , S, and $C_{o2}$ are the detection cells. Note that each input must be excited with a separate transducer and each output must be sensed by a separate transducer. The proposed 4:2 approximate compressor generates $C_{01}$ without any error, and S and $C_{o2}$ with an average error rate of 31.25%, and 18.75%, respectively. Table 7.4 presents the truth table of the accurate 4:2 compressor $C_{01}$ , $S_{ac}$ , and $C_{02ac}$ , the approximate 4:2 compressor without directional coupler $C_{o1}$ , $C_{o2ap1}$ , and $S_{ap1}$ , and the approximate 4:2 compressor with directional coupler $C_{o1}$ , $C_{o2ap2}$ , and $S_{ap2}$ . As it can be observed from the Table, approximate 4:2 compressors with and without directional coupler provide the same average error rate of 25% because $S_{ap1}$ , and $C_{o2ap1}$ have an error rate of 37.5%, and 12.5%, respectively, whereas $S_{ap1}$ , and $C_{o2ap1}$ have an error rate of 31.25%, and 18.75%, respectively. Note that the erroneous outputs values in the Table are underlined and typeset in bold to highlight them. To achieve proper functionality for the structure in Figure 7.4, the waveguide width must be smaller or equal to the SW wavelength to simplify the interference patterns, all SWs must be excited at the same amplitude, wavelength, and frequency, and the Figure 7.4: Approximate Spin Wave Based FA without Normalizer. waveguide lengths must be accurately computed as they determine the SWs interaction modes. For example, if SW constructive (destructive) interference is envisaged for inphase (out-of-phase) SWs, the distances must be equal with $n \times \lambda$ , where n = 0, 1, 2, ...; this is the case for $d_1$ , $d_3$ , $d_4$ , and $d_6$ in Figure 7.4. In contrast, if SW constructive (destructive) interference is envisaged for out-of-phase (in-phase) SWs, the distances must be equal with $(n + 1/2) \times \lambda$ ; this is the case for $d_2$ and $d_5$ in Figure 7.4. On the output side, it is important to detect the output at specific position, i.e., if the desired output is the output itself, which is the case for $C_{01}$ in Figure 7.4, $d_7$ must be equal with $n \times \lambda$ , whereas if the inverted output is desired, the distance must be equal with $(n+1)/2 \times \lambda$ . Moreover, the outputs must be detected as near as possible from the last interference point to capture large SW amplitude. The proposed SW 4:2 compressor operation principle is as follows: - $C_{o1}$ : SWs are excited at $I_1$ , $I_2$ , and $I_3$ with the same amplitude, wavelength, and frequency at the same time moment. The $I_2$ SW interfere constructively or destructively with $I_3$ SW depending on their phase difference, the resulted SW propagates through the waveguide, and subsequently interferes with the $I_1$ SW. The resulted SW is captured at the output $C_{o1}$ based on phase detection. - S and $C_{o2}$ : $I_2$ SW interferes constructively or destructively with $I_3$ SW depending on their phase difference, and the resulted SW propagates through the waveguide to interfere with the SWs excited at $I_4$ and $C_{in}$ . The resulted SW propagates, and subsequently interferes with the $I_1$ SW. Finally, the resulted SW is captured at the outputs S and $C_{o2}$ based on the threshold detection. #### 7.2.2. SIMULATION SETUP AND RESULTS In order to validate the proposed structure by MuMax3 [195], we made use of the parameters specified in SubSection 7.1.2. Furthermore, the SWs are excited with a carrier frequency of 10 GHz modulated by Gaussian pulses with 500 ps sigma, to save energy and guarantee the excitation of single frequency SWs. From the backward volume spin wave dispersion relation, at 10 GHz, we determine k as being 26 rad/ $\mu$ m, which results $C_{o2ap1}$ $C_{in}I_{4}I_{3}I_{2}I_{1}$ $C_{01}$ $C_{o2ac}$ $C_{o2av2}$ $S_{ac}$ $S_{ap1}$ $S_{ap2}$ ī Table 7.4: Accurate and Approximate SW-based 4:2 Compressor. in $\lambda = 2\pi/k = 240$ nm. As discussed previously, the distances $d_1$ , $d_3$ , $d_4$ , $d_6$ , and $d_8$ should be equal to integer multiples of $\lambda$ , whereas $d_2$ and $d_5$ should be equal to integer multiples of $1/2\lambda$ , and are: $d_1 = 240$ nm (n = 1), $d_2 = 600$ nm (n = 2.5), $d_3 = 1440$ nm (n = 6), $d_4 = 720$ nm (n = 3), $d_5 = 840$ nm (n = 3.5), $d_6 = 1680$ nm (n = 7), $d_7 = 240$ nm (n = 1), $d_8 = 120$ nm (n = 0), and $d_9 = 240$ nm (n = 1). Figure 7.5 presents $C_{o1}$ MuMax3 simulation results for $\{I_1,I_2,I_3\}=\{0,0,0\},\{0,0,0\},\{0,0,0\},\{0,0,1\},\{0,1,0\},\{0,1,1\},\{1,0,0\},\{1,0,1\},\{1,1,0\},$ and $\{1,1,1\}$ . One can observe in the Figure that $C_{o1}$ is detected correctly. $C_{o1}=0$ for $\{I_1,I_2,I_3\}=\{0,0,0\},\{0,0,1\},\{0,1,0\},$ and $\{0,1,1\},$ whereas $C_{o1}=1$ for $\{I_1,I_2,I_3\}=\{1,0,0\},\{1,0,1\},\{1,1,0\},$ and $\{1,1,1\},$ as it should, for a 0.4 ns reading window starting 1.80 ns after the input application. Figure 7.5: Normalized 4:2 Compressor Output $C_{01}$ . netization in the Table is normalized with respect to the highest achieved magnetization which is obtained when $\{C_{in},I_4,I_3,I_2,I_1\}=\{0,0,1,1,1\}$ . For $C_{o2}$ detection T=0 is appropriate, which results in $C_{o2}=1$ for input combinations $\{C_{in},I_4,I_3,I_2,I_1\}=\{0,0,0,0,0\}$ , $\{0,1,0,0,0\}$ , $\{0,1,0,0,1\}$ , $\{0,1,0,1,0\}$ , $\{0,1,1,0,0\}$ , $\{1,0,0,0,0\}$ , $\{1,0,0,0,1\}$ , $\{1,0,0,1,0\}$ , $\{1,1,0,0,1\}$ , $\{1,1,0,1,0\}$ , $\{1,1,0,1,1\}$ , $\{1,1,1,0,0\}$ , $\{1,1,1,0,1\}$ , and $\{1,1,1,1,0\}$ , and $C_{o2}=0$ for the remaining cases, as it should. The same threshold value is suitable for S, but the threshold condition is flipped, i.e., if the resulted SW normalized magnetization is larger than 0, S is logic 0, and logic 1, otherwise. This results in S=0 for $\{C_{in},I_4,I_3,I_2,I_1\}=\{0,0,0,0,0\},\{0,1,0,0,0\},\{0,1,0,0,0,1\},\{0,1,0,1,0\},\{0,1,1,0,0\},\{1,0,0,0,0\},\{1,0,0,0,0\},\{1,0,0,0,1\},\{1,0,0,1,0\},\{1,1,0,1,1\},\{1,1,0,1,1\},\{1,1,1,0,0\},\{1,1,1,0,1\},$ and $\{1,1,1,1,0\},$ and S=1 for the remaining cases, as it should. Therefore, the MuMax3 simulations proves that the proposed 4:2 approximate compressor provides the expected functionality. #### 7.2.3. PERFORMANCE EVALUATION We evaluate the proposed SW approximate 4:2 compressor and compare it in terms of error rate, energy consumption, delay, and area (the number of utilized devices) with the state-of-the-art SW, 45 nm CMOS [374], and Spin-CMOS [362] counterparts. In order to assess the performance of our proposal, we make the following assumptions: (i) Magnetoelectric (ME) cells having a power consumption of 34 nW, and a delay of 0.42 ns [348] are utilized for SW excitation/detection. (ii) SWs consume negligible energy during interference and propagation through waveguides. Note that these assumptions might need to be revisited to better capture SW technology future developments. The proposed compressor with and without Directional Coupler (DC) delays can be calculated by adding the SW propagation determined by means of micro-magnetic simulations, and the delay of the excitation and detection cells, which sums-up to 11.4 ns and 3.4 ns, respectively. We note that in order to perform amplitude normalization the DC has to be rather long [352], which results in a large delay overhead. $C_{in}I_4I_3I_2I_1$ Resulting SW $C_{02}$ after thresholding S after thresholding 0.45 -0.08n -0.07 n -0.59-0.01 -0.46-0.49-1 0.66 n 0.23 n 0.22 -0.30.3 -0.21 -0.2n -0.690.68 0.18 0.21 -0.28 0.28 -0.22 n -0.18-0.73 0.51 0.47 0.012 0.59 0.07 Table 7.5: Normalized Approximate SW-based 4:2 Compressor Outputs $C_{02}$ and S. 0.09 -0.4 Note that two approximate CMOS 4:2 Compressor designs were reported in [374]; the first one (CMOS1) consists of an approximate full adder and two multiplexers (9 AND, 2 XOR, and 6 OR 2-input gates); the second one (CMOS2) consists of two approximate full adders (6 2-input XOR gates). In addition, two approximate Spin-CMOS 4:2 Compressor designs were suggested [362]; the first design (SpinCMOS1) consists of 2 approximate full adders (2 3-input Majority gates); the second design (Spin-CMOS2) consists of an accurate and an approximate full adder (2 3-input and 1 5-input Majority gates). Table 7.6 presents the evaluation results. When compared with the accurate SW compressor, which is a direct implementation consisting of two accurate SW adders in [369], the proposed 4:2 compressor saves 31.5% energy and is 1.93x faster. Moreover, it has the same energy consumption, and error rate as the approximate compressor with DC, but it requires 3x less delay. In addition, it consumes 20% and 14% less energy, has approximately 2 orders of magnitude higher delay, and exhibits 61% more and 17% less average error rate when compared with CMOS1 and CMOS2 designs in Table 7.6, respectively. When compared with same error rate Spin-CMOS (Spin-CMOS1 design in Table 7.6), it consumes 3 orders of magnitude less energy and provides a 17% delay reduction. Although Spin-CMOS2 design provides 19% better average error rate, it is 3 order of magnitude less effective in terms of energy consumption and slower. Note that the proposed compressor requires the smallest number of devices, which indicates that it potentially | Technology | Type | Error Rate | Energy (fJ) | Delay (ns) | Device No. | |------------------------|-------------|------------|-------------|------------|------------| | Spin Wave | Accurate | 0 | 0.2 | 6.56 | 14 | | Spin Wave (with DC) | Approximate | 0.31 | 0.137 | 11.4 | 8 | | Spin Wave (without DC) | Approximate | 0.31 | 0.137 | 3.4 | 8 | | CMOS1 [374] | Approximate | 0.125 | 0.172 | 0.049 | 40 | | CMOS2 [374] | Approximate | 0.375 | 0.16 | 0.048 | 28 | | Spin-CMOS1 [362] | Approximate | 0.31 | 173 | 3 | 28 | | Spin-CMOS2 [362] | Approximate | 0.25 | 338 | 4 | 42 | Table 7.6: Approximate 4:2 Compressor Performance Comparison. Figure 7.6: Approximate SW-based Multiplier requires the lowest chip real-estate. . To get some insight in the implications of our proposal at the application level, we consider the well-known JPEG encoding, which relies on the Discrete Cosine Transform (DCT) [371], as discussion vehicle. Given that JPEG encoding is error tolerant and DCT is a multiplication dominated algorithm, 4:2 approximate compressors based tree multipliers are quite attractive for practical JPEG code implementations. Such an approach has been presented in [362], and given that the approximate 4:2 compressor in [362] has the same average error rate as the one we propose, we can infer that replacing their compressor with ours does not change the image quality, while resulting in 3 orders of magnitude less energy consumption. # 7.3. SW APPROXIMATE 2-BIT INPUTS MULTIPLIER We will discuss the design of the 2-bit inputs multiplier structure in addition to the simulation setup, results, and performance evaluation. | $X_1 X_0 Y_1 Y_0$ | $Q_0$ | $Q_{1ac}$ | $Q_{1ap}$ | $Q_{1ap}*$ | $Q_{2ac}$ | $Q_{2ap}$ | $Q_{3ac}$ | $Q_{3ap}$ | $Q_{3ap}*$ | |-------------------|-------|-----------|-----------|------------|-----------|-----------|-----------|-----------|------------| | 0000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0001 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0010 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0011 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 0100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0101 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0110 | 0 | 1 | 0 | <u>0</u> | 0 | 0 | 0 | 0 | 0 | | 0111 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 1000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 1001 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 1010 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | | 1011 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | | 1100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 1101 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 1110 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | | 1111 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | Table 7.7: Accurate and Approximate SW-based Multiplier #### 7.3.1. SW APPROXIMATE 2-BIT INPUTS MULTIPLIER STRUCTURE Figure 7.6 presents the proposed Approximate 2-bit inputs SW-based Multiplier (AMUL). Its inputs are the 2-bit operands $X=(X_1,X_0)$ and $Y=(Y_1,Y_0)$ and its 4-bit output is $Q=(Q_0,Q_1,Q_2,Q_3)$ . The AMUL consists of 4 excitation cells, 4 detection cells, and 3 AND gates that evaluate AMUL outputs as $Q_0=AND(X_0,Y_0)$ , $Q_1=Q_2=AND(X_1,Y_1)$ , and $Q_3=AND(X_0,X_1,Y_1)$ . To evaluate the error rate, we note that the accurate multiplier (MUL) output bits are computed as $Q_0 = (X_0, Y_0), Q_1 = XOR(AND(X_0, Y_1), AND(X_1, Y_0)), Q_2 = XOR(AND(AND(X_0, Y_1), AND(X_1, Y_0)), AND(X_1, Y_1)),$ and $Q_3 = AND(AND(X_0, Y_0), AND(X_1, Y_1)).$ AMUL and MUL output bit values for all possible input combinations are summarized in Table 7.7, where $Q_0$ , $Q_{1ac}$ , $Q_{2ac}$ , and $Q_{3ac}$ designate MUL outputs and $Q_0$ , $Q_{1ap}$ , $Q_{2ap}$ , and $Q_{3ap}$ AMUL outputs. We note that since $Q_0$ is computed as $AND(X_0, Y_0)$ in both MUL and AMUL $Q_{0ap}$ is omitted in the Table and the erroneous AMUL output values are typeset in bold and underlined. One can observe in the Table that AMUL outputs $Q_{1ap}$ , $Q_{2ap}$ , and $Q_{3ap}$ approximate $Q_1$ , $Q_2$ , and $Q_3$ , respectively, with 31.25%, 6.25%, and 6.25% error rate. The error rates can be further reduced if threshold based output detection is utilized, which results in the reduction of $Q_1$ and $Q_3$ approximation error rate to 25% and 0%, respectively. Table 7.7 also includes the AMUL output values $Q_{1ap}$ and $Q_{3ap}$ obtained via threshold detection, while $Q_{0ap}$ and $Q_{2ap}$ are not reported as they are identical to $Q_0$ and $Q_{2ap}$ , respectively. Thus, AMUX error rate becomes 25% as it produces erroneous result for 4 out of the 16 possible input combinations. The previously mentioned design parameters hold true for the AMUL as well. However, in contrast to AFA, AMUL relies on threshold based output detection, which means that the detection cells must be as close as possible to the last interference point. There- Figure 7.7: Normalized First AMUL Output. Figure 7.8: Normalized Second AMUL Output. fore, $d_4$ , $d_5$ , $d_6$ , and $d_7$ values should be minimized for the AMUL design. #### 7.3.2. SIMULATION SETUP AND RESULTS We make use of the same parameters in SubSection 7.1.2 to validate the AMUL by means of MuMax3. Following the same design steps, the AMUL dimensions are $d_1$ =330 nm (n = 6), $d_2$ =880 nm (n = 16), $d_3$ =220 nm (n = 4), $d_4$ =40 nm, $d_5$ =40 nm, $d_6$ =40 nm, and $d_7$ =80 nm. Figures 7.7 to 7.10 present the AMUL MuMax3 simulation results. In the figures, the y-axis presents the SW $M_x$ over $M_s$ ratio, where $M_x$ is the magnetization projection along the x-direction and $M_s$ the saturation magnetization. Inspecting Figure 7.7, we observe that the dynamic magnetization amplitude at the output $Q_0$ at time 2.7 ns for the input values $X_1Y_1X_0Y_0=\{0011,0111,1011,1111\}$ lies between $0.001M_s$ and $0.01M_s$ for the rest of the input combinations. Thus, by setting the detection threshold to $0.001M_sQ_0$ is always properly detected. A similar approach can be applied to Figure 7.8 for the determination of $Q_1$ threshold Figure 7.9: Normalized Third AMUL Output. Figure 7.10: Normalized Forth AMUL Output. value. For instance, the SW amplitude for the input combinations $X_1Y_1X_0Y_0=\{0101,0111,1001,1011,1100,1101,1110,1111\}$ is larger than 0 when reading them at time 2.76 ns. On the other hand, for the other input combinations, magnetization amplitude is less than 0. Therefore, if the detection threshold is set to 0 $Q_1$ value can be derived. Note that this approach for determining the threshold value further reduces the theoretically predicted $Q_1$ error rate from 31.25% to 25%. The threshold in Figure 7.9 is determined in the same way. The SW magnetization for input combinations $X_1Y_1X_0Y_0=\{1100,1101,1110,1111\}$ is larger than $0.0005M_s$ when reading them at time 2.76 ns, whereas, in the other cases, magnetization amplitude is less than $0.0005M_s$ . Therefore, if the detection threshold is set to $0.0005M_s$ $Q_2$ is properly obtained with 6.25% error rate. Finally, Figure 7.10 is analyzed in the same manner. The SWs magnetization for input combination $X_1Y_1X_0Y_0=\{1111\}$ is larger than $0.0014M_s$ when reading them at time 2.76 ns, whereas, in the other cases, the magnetization amplitude is less than $0.0014M_s$ . Therefore, if the detection threshold is set to $0.0014M_s$ $Q_3$ can be obtained with 0% error 7.4. CONCLUSIONS 131 Table 7.8: 2-bit inputs Multiplier Performance Comparison | Design | CMOS[349], [375] | | SV | Proposed MUL | | |--------------------|------------------|-------------|-------------------|----------------------|-------------| | Implemented method | - | | Coupler Cascading | Conversion Cascading | - | | Type | Accurate | Approximate | Accurate | Accurate | Approximate | | Error Rate | 0 | 0.38 | 0 | 0 | 0.25 | | Energy (aJ) | 959 | 300 | 320 | 430 | 115 | | Delay (ns) | 0.1 | 0.06 | 21 | 1.68 | 3.6 | | Device No. | 52 | 30 | 22 | 30 | 8 | rate. #### 7.3.3. PERFORMANCE EVALUATION Under the same assumptions as in SubSection 7.1.3, AMUL delay is 3.3 ns and we compare it with the state-of-the-art SW [352] and CMOS [375] counterparts. As delay figures are not mentioned for the approximate multiplier in [375], its energy consumption was estimated based on the 16 nm CMOS figures provided in [349]. Table 7.8 presents the results of the evaluation and comparison. Inspecting the Table, it is clear that AMUL outperforms accurate 16 nm CMOS [375] and approximate 16 nm CMOS [375] counterparts by diminishing the energy consumption by 8x and 2.6x while exhibiting 36x and 60x larger delay, respectively. AMUL provides an error rate of 25% while 38% is the error rate for the approximate CMOS counterpart [375]. Note that the error rate is calculated by determining the total number of erroneous multiplication results generated by the multiplier divided by the total number of cases which is 16 in this case. When compared with accurate MUL SW implementations, AMUL provides 2.8x and 3.7x energy reduction and has approximately 6x lower and 2.5x higher delay in comparison with the SW coupler and conversion based MUL implementations, respectively. We note that the SW propagation delay is neglected into the evaluation of the SW conversion based MUL in [352]. One can observe from the Table that the proposed MUL requires less ME cells than the SW designs in [74], which indicates that the design in [352] has a larger area and by implication a larger delay when also SW propagation is considered. #### 7.4. CONCLUSIONS We proposed and validated by means of micromagnetic simulations a novel approximate energy efficient spin wave based Full Adder (AFA), and was evaluated and compared with the state-of-the-art counterparts. AFA saves 43% and 33% energy when compared with the state-of-the-art SW and 7 nm CMOS, respectively, and 69% and 44% in comparison with accurate and approximate 45 nm CMOS, respectively. In addition, it saves more than 2 orders of magnitude when compared with accurate SHE, and accurate and approximate DWM, MTJ, and Spin-CMOS FAs. Moreover, it achieves the same error rate as approximate 45 nm CMOS and Spin-CMOS FA whereas it exhibits 50% less error rate than approximate DWM FA and requires at least 29% less chip real-estate in comparison with the other state-of-the-art designs. In addition, we introduced a a Spin Wave (SW) based 4:2 approximate compressor, which consists of 3-input and 5-input Majority gates. We reported the design of approximate circuits without directional couplers, which are essential to normalize gate output(s) when cascading them in accurate circuit designs. We validated the proposed compressor by means of micromagnetic simulations, and compared it with the state-of-the-art SW, 22 nm CMOS, 45 nm CMOS, and Spin-CMOS counterparts. The evaluation results indicated that the proposed 4:2 compressor saves 31.5% energy in comparison with the accurate SW compressor, has the same energy consumption, and error rate as the approximate compressor with DC, but it required 3x less delay. Moreover, it consumes 14% less energy, while having 17% lower error rate when compared with the approximate 45 nm CMOS counterpart. Furthermore, it outperformes the approximate Spin-CMOS based compressor by 3 orders of magnitude in term of energy consumption while providing the same error rate. Last but not least, the proposed compressor requires the smallest number of devices, thus it potentially requires the lowest chip real-estate. Finally, we proposed and validated, by means of micromagnetic simulations, a 2-bit inputs multiplier (AMUL). It was evaluated and compared with the state-of-the-art counterparts. AMUL energy consumption is at least 2.5x smaller the one of state-of-the-art accurate SW designs and 16 nm CMOS accurate and approximate designs. Moreover, AMUL exhibits an error rate of 25%, while the approximate CMOS MUL one of 38%, and requires at least 64% less chip real-estate. # NON-BINARY SPIN WAVE COMPUTING APPROACH - 1.1. CONVENTIONAL SPIN WAVE COMPUTING - 1.2. Non-binary Spin Wave Computing - 1.3. SIMULATION SETUP AND RESULTS - 1.4. Performance Evaluation and Discussion - 1.5. CONCLUSIONS Most of the proposed designs [58], [59], [62], [80], [81], [97]–[110] make use of majority gates to develop Boolean algebra based SW circuits, which construction requires gate fanout and cascading capabilities, numerous electric to SW domain conversion, and large external magnetic fields as explained in Chapter 6. As such the SW based computation potential is not fully utilized and the ultra-low energy consumption promise is partially lost. In this chapter, we go beyond Boolean algebra and propose a non-binary SW computing paradigm that enables full SW circuit construction without requiring gate fanout and cascading, domain conversions, and large external fields. Subsequently, we leverage this computing paradigm by designing a non-binary spin wave adder, which we validate by means of micro-magnetic simulation. To get more inside on the proposed adder potential we assume a 2-bit adder implementation as discussion vehicle, evaluate its area, delay, and energy consumption, and compare it with the state-of-the-art. This chapter content is based on the following publications: **A. N. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui and S. Cotofana, *Non-binary Spin Wave Based Circuit Design.*, accepted in IEEE Transaction on Circuits and System (TCAS I), 2022, pp.1-14. Figure 8.1: a) SW Circuit Design Conventional Structure. b) Cascaded MAJ3 Gates. c) SW Waveform Analysis at $\{I_1I_2I_3I_4I_5I_6I_7\}$ ={0001101}. #### 8.1. CONVENTIONAL SPIN WAVE COMPUTING Figure 8.1a) presents the generic circuit structure for SW phase based information encoding, which consists of three main parts. First, the binary inputs $I_1, I_2, \ldots, I_n$ are utilized to excite SWs with the same amplitude but different phases reflecting their values. Subsequently, these spin waves propagate through the waveguides, and within the intersection region CC interfere constructively or destructively depending on their phases in order to emulate the functionality of the targeted combinational circuit, e.g., multiplexer, decoder, adder, multiplier. Finally, the interference result is captured at the output O. To get more inside on the way such a circuit operates let us assume the circuit in Figure 8.1b), which consists of three 3-input Majority gates (MAJ3) computing $O_1$ = $MAJ(I_1, I_2, I_3)$ and $O_2 = MAJ(MAJ(I_1, I_2, I_3), I_7, MAJ(I_4, I_5, I_6)$ . Figure 8.1c) presents, as an example, the interference results for the input pattern $\{I_1 I_2 I_3 I_4 I_5 I_6 I_7\} = \{0001101\}$ . Note that we make use of binary amplitude information encoding, thus logic 0/1 are represented with a spin wave with amplitude A and $0/\pi$ phase. As it can be observed from Figure 8.1c), $I_1I_2I_3$ interfere constructively in MAJ A, resulting in a 3A amplitude and 0 phase spin wave, which further travels towards $O_1$ and MAJ C. However, the majority of its energy flows through WG I because this is a straight waveguide connected to WG G whereas the connection to WG H is bent. On the other hand, $I_4I_5I_6$ interfere constructively and destructively in MAJ B resulting in an A amplitude and $\pi$ phase spin wave. Thus, MAJ C operates on the WG I SW (amplitude 3A minus a small portion that went to WG H and phase of 0), WG J SW (with amplitude A and phase of $\pi$ ), and WG K SW (amplitude A and phase of $\pi$ ). While the expected MAJ C output in this case is logic 1 (two phase $\pi$ SWs and one phase 0 SW) Figure 8.1c) indicates that the WG L SW has a phase of 0, which is wrong. This miscalculation is induced by the fact that MAJ C input SWs have different amplitudes and as such the $\approx 3A$ amplitude phase 0 SW illegitimacy wins the voting process over the two amplitude A phase $\pi$ SWs. The correction of this problem requires WG G SW amplitude normalization, i.e., reduction from 3A to A, and SW energy loss prevention in situations like the one at VG G. These can be achieved by means of, e.g., domain conversion, directional coupling [352], and fanout achievement [77], [78], [347], [376], which induces significant area, delay, and energy consumption overheads. Given that the realization of practically relevant non-toy SW circuits requires fanout and gate cascading capabilities, with their associated overheads, the investigation of computation paradigms that make better use of the SW technology is of great interest especially that most of the reported designs make use of majority gates to develop Boolean algebra based SW circuits [58], [59], [62], [70], [77]–[81], [97]–[101], [104]–[109], [352], [369], [377]–[379]. As such the SW based computation potential is not fully utilized and the ultra-low energy consumption promise is partially lost, and, in this line of reasoning we introduce in this chapter a novel beyond Boolean algebra SW computation paradigm. #### 8.2. Non-binary Spin Wave Computing The traditional combinational circuit implementation starts with the truth table of an *n*-input Boolean function $f(I_1, I_2, ..., I_n)$ , derives the expression of f as sum of products (product of sums), and processes it to make the best use of the available universal set of Boolean gates, e.g., NAND, NOR, while minimizing the implementation cost and delay. The same approach is utilized for SW circuits but in this case the universal gate set comprises Majority gates and inverters. While this is an attractive approach that benefits of the rather mature CMOS circuit design framework, it limits the utilization of SW potential as discussed in Section 2.1. In this section we propose a way to break the Boolean algebra wall by implementing f not based on its $2^n$ entry true table but on an n-entry one that expresses f as a function of $\sum_{i=1}^{n} I_{j}$ . Such a description exist for a large class of practically relevant functions called (generalized) symmetric functions, which includes, e.g., AND, OR, Parity, addition, multiplication [380]-[382]. Following this paradigm in the SW domain requires two computation steps: (1) the calculation of $S = \sum_{i=1}^{n} I_i$ , and (2) the assignation of f as function of S. (1) is straightforward if information encoding is done in SW amplitude (logic 0 no SW, logic 1 SW with unit amplitude A) as in this case the input SWs always interfere constructively resulting in a SW with $S = A \sum_{i=1}^{n} I_i$ amplitude. (2) is more intricate as it requires a SW amplitude conversion process. For example if f is the n-input parity function $S \in [0, nA]$ and f should be logic 1 if S is odd and logic 0, otherwise, which is what (2) should perform. To get more inside into stage (1) let us assume the structure in Figure 8.2, with an n-bit binary number $(I_1, I_2, \ldots, I_n)$ as input. Each Boolean input $I_j$ , j=1, n induces a SW with amplitude $AI_j2^j$ , which results in the formation of a SW with amplitude $\sum_{j=1}^n AI_j2^j$ , i.e., proportional with the decimal value of the input vector, at the output of the CC block. If we extend the structure to two n-bit inputs X and Y, the output SW amplitude is equal with $\sum_{j=1}^n A(X_j2^j+Y_j2^j)$ , i.e, the result of the X+Y binary addition. Thus in this way we completed the addition without relying an any Boolean gate as the output SW caries the addition result. What still remains to be done is to obtain the binary representation of X+Y on n+1 bits via a process of non-binary to digital conversion within stage (2). We Figure 8.2: Generic Non-binary SW Circuit Structure. note that the direct summation can also be applied to binary signed digit representations [383] if SW phase is also considered in the encoding, i.e, 0 corresponds to no SW and 1/-1 to unit amplitude SW with phase 0/1, respectively. #### SW Non-binary to Binary Converter The non-binary to binary converter, i.e., the NB/B in Figure 8.2, can be implemented by means of multiple waveguides closely spaced to each other. Given the Directional Coupler (DC) ability to route SW energy between its component waveguides we make use of a number of specially tailored DCs to design the Non-Binary to Binary (NB/B) converter. Recall that DCs working in linear regime split the input SW into half between the waveguides regardless of its amplitude and DC working in non-linear regime that can be designed using Equations (2.18) - (2.29) split the SW between waveguides with an input SW amplitude dependent ratio. To clarify the NB/B converter concept, we instantiate the 3-bit converter presented in Figure 8.3. In the Figure, I is the SW input with amplitude from 0A to 7A, $O_1$ , $O_2$ , and $O_3$ are the outputs, and 9 directional couplers are needed to perform the correct NB to B conversion. In order to properly design the directional couplers one needs to know when each output is 1 and 0, which is presented in Table 8.1 for the 3-bit converter in Figure 8.3: $O_3 = 1$ if SW input amplitude is larger than 3A, and 0, otherwise, $O_2 = 1$ if SW input amplitude is 2A, 3A, 6A, and 7A, and 0, otherwise, and $O_1 = 1$ if SW input amplitude is 1A, 3A, 5A, and 7A, and 0, otherwise. Capturing $O_3$ seems straightforward as its value obeys one condition only, thus DC2 can be designed such that if SW amplitude is larger than 3A, it moves to $O_3$ , and nothing moves, otherwise. However, by doing so $O_1$ and O<sub>2</sub> cannot be captured correctly when they are 1 if the SW amplitude is larger than 3A as the SW energy moves completely to $O_3$ . Therefore, the input spin wave signal should be divided into two equal parts which means that DC1 should work in the linear regime. After this split $O_3 = 1$ if SW amplitude is larger than 1.5A. Therefore, the second directional coupler must be designed with a threshold of 1.75A, which is the average of the cases 1.5A and 2A. If the spin wave amplitude is larger than 1.75A the spin wave moves completely to the upper part (WG C) to be captured at O<sub>3</sub>, and nothing moves to WG C, otherwise. $O_2$ value is determined by two conditions, $O_2 = 1$ if the spin wave amplitude is larger than 1A and less than 4A, and larger than 5A as indicated in Table 8.1. In order to obtain its proper value DC4 and DC5 need to be designed such that DC5 moves the SW energy in WGA completely to $O_2$ if SW amplitude is larger than 1A as the SW energy is 0 if SW amplitude is larger than 3A, and DC4 moves the SW energy in WG B completely to $O_2$ if Figure 8.3: 3-bit SW NB/B Converter. SW amplitude is larger than 5A to meet the second condition. However, by doing so $O_1$ cannot be correctly computed as no SW will be captured at $O_1$ when the SW amplitude equals to 7A. Therefore, the non-binary spin wave signal in WG B should be divided into two equal parts to correctly detect $O_1$ , thus DC3 should work in linear regime as a second splitter. Thus, in order to obtain $O_2 = 1$ if the spin wave amplitude is larger than 0.5A and less than 2A after the first splitter, DC 5 must be designed with a threshold value of 0.75A, which is the average of 0.5A and 1A. Hence, the spin wave moves completely to WG D if the spin wave amplitude is larger than 0.75A, and nothing moves to WG D, otherwise. To obtain $O_2 = 1$ if spin wave amplitude is larger than 1.25A after the splitters, DC4 must be designed with a threshold value of 1.375A, which is the average of the cases 1.25A and 1.5A. By doing this, a WG A spin wave with amplitude less than 1.375A is not affected and no energy is transferred to WG D, and when the amplitude is larger than 1.375A, the spin wave is transferred to WG D. Finally, $O_1 = 1$ if the spin wave amplitude is 1A, 3A, 5A, and 7A as presented in Table 8.1. From the above, a spin wave exists in WG A and reaches $O_1$ when the spin wave amplitude is less than 0.75A (after the splitters) which meets the first condition: $O_1 = 1$ when SW amplitude is 1A. Also, the spin wave available in WG B reaches DC6 when it amplitude is less than 1.375 A. Therefore, to meet the second condition: $O_1 = 1$ when SW amplitude is 3A, DC6 must be designed with a threshold value of 0.625A, which is the average of the cases 0.5A and 0.75A such that if spin wave amplitude is larger than 0.625A, the spin wave moves completely to WG A, and nothing moves to WG A, otherwise. In addition, DC7 must be designed with a threshold value of 0.875A, which is the average of the cases 0.5A and 0.75A such that if spin wave amplitude is larger than 0.875A, the spin wave moves completely to WG E and nothing moves to WG E, otherwise. This is done to prevent the existence of a spin wave in WG A when SW amplitude equals to 2A and 4A as $O_1$ must be 0 at these cases. Moreover, DC8 must be designed with a threshold value of 1.125A, which is the average of the cases 1A and 1.25A such that if spin wave amplitude is larger than 1.125 A is moves completely to WGA, and nothing moves to WG A, otherwise. Finally, DC9 must be designed with a threshold value of 1.625A, which is the average of the cases 1.5A and 1.75A such that if spin wave amplitude is larger than 1.625 A it moves completely to WG A, and nothing moves to WG A, otherwise. Thus, by designing the directional couplers with the aforementioned thresholds, the three outputs are correctly captured. Note that the aforementioned explanation is for the ideal case without taking into consideration the damping or the exact energy that remains or moves to the other waveguide(s) from the directional couplers, but the operation principle remains the same. Ad- Table 8.1: 3-bit SW NB/B Converter Truth Table. | I | $O_1$ | $O_2$ | $O_3$ | |------------|-------|-------|-------| | 0 | 0 | 0 | 0 | | 1 <i>A</i> | 1 | 0 | 0 | | 2A | 0 | 1 | 0 | | 3A | 1 | 1 | 0 | | 4A | 0 | 0 | 1 | | 5A | 1 | 0 | 1 | | 6A | 0 | 1 | 1 | | 7A | 1 | 1 | 1 | Figure 8.4: Proposed SW Non-binary Adder. ditionally, the outputs are captured based on the thresholding condition such that if the received spin wave amplitude is larger than a predefined threshold, it corresponds to logic 1, and its is logic 0, otherwise. The outputs should be placed as near as possible after the last directional coupler to minimize spin wave amplitude decay effects. This concept can be extended to n-bit NB/B converter, case in which it requires N+1 directional couplers where N is the number of 0 to 1 changes in the conversion table. The same way of thinking can be followed to determine the DCs' thresholds and Equations (2.18) - (2.29) to correctly design the directional couplers. #### SW NON-BINARY ADDER To better explain and illustrate our approach we apply it for the design a 2-bit adder as depicted in Figure 8.4. The 2-bit binary inputs are transformed into SWs by means of the excitation cells $I_{11}$ , $I_{12}$ , $I_{21}$ , and $I_{22}$ , which take into account the position weights, i.e., $I_{11}$ and $I_{21}$ are excited with an amplitude of 1A, whereas $I_{21}$ and $I_{22}$ with 2A. After spin wave excitation, the spin waves propagate through the waveguide and interfere constructively. The resultant SW from interference is converted to binary by the proposed NB/B converter and is captured at the outputs as presented in Figure 8.4. The circuit dimensions such as the distances between the excitation cells and directional couplers dimensions must be carefully chosen as described in [352] to ensure correct functionality. For instance, if the required result is to interfere constructively if they have the same phases, then the distances between the excitation cells must be $n \times \lambda$ , i.e, $d_1 = d_2 = d_3 = n\lambda$ (where n = 1, 2, 3...). Since the maximum output of the 2-bit adder is 110 as can be observed in Table 8.2, we simplified the 3-bit NB/B converter in Figure 8.3 to minimaze delay and save area to the structure presented in Figure 8.4. Seven different directional couplers are used to $I_{12}$ A2 $I_{11}$ $I_{22}$ $I_{21}$ A1 $O_1$ $O_2$ $O_3$ 0.5A1A2A1A3A1.5A4A2A5A2.5A Table 8.2: Non-binary SW Adder Outputs. convert the non-binary result of the adder to binary outputs. The first directional coupler is designed based on the maximum amount of the outputs that can be logic 1 simultaneously. In this case, as can be seen from Table 8.2, maximum two of the three outputs can be logic 1. Therefore, the non-binary spin wave signal should be divided into two equal parts to allow simultaneously spin wave propagation to two outputs. Hence, the first directional coupler works in the linear regime and splits the energy of the spin wave into two equal parts independent on the spin wave amplitude. Note that if the implementation of a more complex adder is targeted for which n outputs could simultaneously assume logic 1, the input spin wave energy has to be divided into n equal parts. 6A 3A The other six directional couplers work in the non-linear regime such that there is an amplitude threshold for the energy transfer from one waveguide to another. The amplitude threshold is different for every coupler and can be determined by considering the amplitudes after the splitter indicated in Table 8.2 columns A1 and A2 by following the line of thinking explained in the previous subsection. The same operation principle and design steps are followed but some thresholds are different as one splitter is used here and 6 directional couplers with the following thresholds: 1.75A for DC2, 0.75A for DC3, 2.75A for DC4, 1.25A for DC5, 1.75A for DC6, and 2.25A for DC7. Additionally, the output values are captured by means of thresholding as previously explained. #### **8.3.** SIMULATION SETUP AND RESULTS To validate our proposal we make use of the GPU-accelerated micromagnetic software MuMax3 [195], which can solve the LLG equation. MuMax3 simulations require the specification of suitable parameters to describe the simulated structure and reflect the environment. We used a $Fe_{60}Co_{20}B_{20}$ waveguide with width of 30 nm and thickness | Cases | | | | $O_1$ | $O_2$ | $O_3$ | |-----------|-----------|-----------|-----------|-------|--------|-------| | $I_{1,1}$ | $I_{1,2}$ | $I_{2,1}$ | $I_{2,2}$ | | | | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 0 | 1 | 0.69 | 0.1 | 0.002 | | 0 | 0 | 1 | 0 | 0.027 | 0.77 | 0.16 | | 0 | 0 | 1 | 1 | 1 | 0.39 | 0.37 | | 0 | 1 | 0 | 0 | 0.99 | 0.24 | 0.01 | | 0 | 1 | 0 | 1 | 0.043 | 0.65 | 0.1 | | 0 | 1 | 1 | 0 | 0.66 | 0.58 | 0.32 | | 0 | 1 | 1 | 1 | 0.42 | 0.001 | 0.56 | | 1 | 0 | 0 | 0 | 0.22 | 0.68 | 0.08 | | 1 | 0 | 0 | 1 | 0.98 | 0.3 | 0.4 | | 1 | 0 | 1 | 0 | 0.16 | 0.078 | 0.75 | | 1 | 0 | 1 | 1 | 0.91 | 0.094 | 0.88 | | 1 | 1 | 0 | 0 | 0.91 | 1 | 0.27 | | 1 | 1 | 0 | 1 | 0.008 | 0.0006 | 0.79 | | 1 | 1 | 1 | 0 | 0.84 | 0.03 | 0.8 | | 1 | 1 | 1 | 1 | 0.045 | 0.56 | 1 | Table 8.3: Normalized Non-binary Adder Outputs. of 1 nm to test the proposed structure, in addition to the following parameters: magnetic saturation $M_s$ =1.1 MA/m, perpendicular anisotropy constant $k_{ani}$ =8.3 MJ/m³, exchange stiffness $A_{ex}$ =18.5 pJ/m, and damping constant $\alpha$ = 2 × 10<sup>-4</sup> [346]. We determined the spin wave dispersion relation for these parameters, and for a wavelength of $\lambda$ =200 nm, the spin wave frequency is determined to be f= 14.03 GHz. Hence, the distances between excitation cells $d_1$ , $d_2$ , and $d_3$ has to be 200 nm. Additionally, we used Equations (2.18) - (2.29) in order to determine the directional couplers dimensions. Based on the above parameters and equations we obtained the following dimensions: $L_{w1}$ =370 $\mu$ m, $L_{w2}$ = $L_{w3}$ = $L_{w4}$ = $L_{w5}$ = $L_{w6}$ = $L_{w7}$ =2.55 $\mu$ m, $DW_1$ =50 nm, $DW_2$ =15 nm, $DW_3$ =30 nm, $DW_4$ =10 nm, $DW_5$ =11 nm, $DW_6$ =13 nm, and $DW_7$ =17 nm. Table 8.3 presents the normalized spin wave magnetization at the adder outputs $O_1$ , $O_2$ , and $O_3$ reported by MuMax3 for different input patterns. By inspecting the Table, one can observe that by defining an appropriate threshold for the three outputs, the correct values can be obtained. For $O_1$ , the normalized threshold can be found by averaging the normalized output of the two cases $I_{12}I_{11}I_{21}I_{22}$ =0111, and $I_{12}I_{11}I_{21}I_{22}$ =1000, which equals to 0.32. The normalized threshold for $O_2$ can be set to 0.27 by averaging the numbers in Table 8.3 for the cases $I_{12}I_{11}I_{21}I_{22}$ =0100, and $I_{12}I_{11}I_{21}I_{22}$ =1001. The normalized threshold for $O_3$ is equal to 0.48 by averaging the normalized magnetization for $I_{11}I_{12}I_{22}$ =1001 and $I_{12}I_{11}I_{21}I_{22}$ =0111. As it can be observed from the Table, the 3-bit sum value is correctly computed: if $O_3$ , $O_2$ , and $O_1$ normalized magnetization is larger than 0.48, 0.27, and 0.32 its value is 1 and 0 otherwise, respectively, as it should. | | CMOS [349] | SW [348] | SW | |----------------------|----------------|----------------------|---------------------------| | Technology | 16 nm CMOS | SW | SW | | Implemented function | 2-bit adder | Standard 2-bit adder | Proposed non-binary adder | | Energy (aJ) | 3777 | 317 | 101 | | Delay (ns) | 0.071 | 23 | 23 | | Utilized Device No. | 48 Transistors | 22 ME cells | 7 ME cells | Table 8.4: Performance Comparison. #### 8.4. Performance Evaluation and Discussion To get some inside on the practical implications of our proposal, we evaluate the energy, delay, and area of the proposed 2-bit adder and compare them with the ones of conventional SW and 16 nm CMOS counterparts. We assume that excitation and detection transducers are magnetoelectric (ME) cells operating at $V_{ME}$ =119 mV with a capacitance $C_{ME}$ = 1 fF, and a 0.42 ns switching delay [348]. Furthermore, we assumed that the spin waves consume negligible energy in the waveguide and directional couplers when compared to the energy consumed by the excitation and detection cells [352], which implies that the adder energy consumption is $I \times C_{ME} \times V_{ME}^2$ , where I is the number of excitation and detection cells. MuMax3 simulations results suggest that the spin wave propagation through the waveguide delay is 22 ns. Furthermore, we assume that pulse signals are utilized for SW excitation, which indicates that the energy consumption calculation only depends on the 0.42 ns applied pulse length and it is independent of the overall adder delay. Note that due to the SW technology infancy and foreseeable developments, these assumptions might need be revisited in the near future. To compare with the conventional spin wave counterpart, we estimate the energy, delay, and number of devices of a SW Majority gate based 2-bit adder implementation. We assume that fanout and gate cascading solutions in [77], [352] are at hand and that fanout is achieved without any delay overhead and gate cascading induces a 22 ns delay overhead [77], [352]. To compare with a 16 nm CMOS 2-bit, which can be built 3 AND gates, 1 OR gate, and 3 XOR gates we make use of the energy, delay, and area estimates in [349]. Table 8.4 presents the evaluation results, which indicate that while being 320x slower than the CMOS counterpart, the proposed SW non-binary adder provides a 37x energy consumption reduction. In addition, the Table suggests that the conventional approach to implement a 2-bit adder in the spin wave domain consumes 3.14x more energy than the proposed non-binary adder for the same delay. Furthermore, the proposed adder implementation requires the least number of devices. #### **8.5.** CONCLUSIONS In this chapter we introduced a novel non Boolean algebra based computation paradigm, which enables domain conversion free ultra-low energy consumption SW based computing. Subsequently, we leveraged this computing paradigm by designing a non-binary spin wave adder, which we validated by means of micro-magnetic simulation. To get more inside on the proposed adder potential we assumed a 2-bit adder implementation as discussion vehicle, evaluated its area, delay, and energy consumption, and compared it with conventional SW and 16 nm CMOS counterparts. The results indicated that our proposal diminishes the energy consumption by a factor of 3.14x and 37x, when compared with the conventional SW and 16 nm CMOS functionally equivalent designs, respectively. Furthermore, the proposed non-binary adder implementation requires the least number of devices, which indicates SW potential for the realization of small chip real-estate beyond state-of-the-art circuits and computation platforms. # INITIAL BENCHMARKING OF SPIN WAVE TECHNOLOGY #### 1.1. SW Transducer Power Upper Bound #### 1.2. CONCLUSIONS In the early stages of a novel technology development, it is difficult to provide a comprehensive assessment of its potential capabilities and impact. Nevertheless, some preliminary estimates can be drawn and are certainly of great interest and in this chapter we follow this line of reasoning within the framework of the Spin Wave (SW) based computing paradigm. In particular, we are interested in assessing the technological development horizon that needs to be reached in order to unleash the full SW paradigm potential such that SW circuits can outperform CMOS counterparts in terms of energy consumption. In view of the zero power SWs propagation through ferromagnetic waveguides, the overall SW circuit power consumption is determined by the one associated to SWs generation and sensing by means of transducers. While current antenna based transducers are clearly power hungry recent developments indicate that magneto-electric (ME) cells have a great potential for ultra-low power SW generation and sensing. Given that MEs have been only proposed at the conceptual level and no actual experimental demonstration has been reported we cannot evaluate the impact of their utilization on the SW circuit energy consumption. In this chapter, we perform a reverse engineering alike analysis to determine ME delay and power consumption upper bounds that can place SW circuits in the leading position. Figure 9.1: 32-bit Brent-Kung prefix adder based on AND, OR, and XOR gates [384]. #### 9.1. SW Transducer Power Upper Bound As stated into the introduction, our goal is to determine the technological limits that need to be reached in order to unleash the SW computing paradigm full potential such that magnonic circuits can outperform CMOS counterparts in terms of energy consumption. In view of the zero power SWs propagation through ferromagnetic waveguides, the overall magnonic circuit power consumption is determined by the one associated to SWs generation and sensing by means of transducers. Thus we focus our analysis on determining transducer power consumption acceptable upper bounds that need to be achieve in future transducer implementations, e.g., ME cells. For this study, we choose a 32-bit Brent-Kung prefix adder (BKA), which is presented in Figure 9.1, as discussion vehicle and compute the maximum transducer power values that potentially enable a BKA SW implementation able to outperform its 7 nm CMOS counterpart. We note that the Brent-Kung adder is a Parallel Prefix Adder (PPA) form of the Carry-Look Ahead adder (CLA) that exhibits structure regularity, low wiring congestion, and reasonable area performance ratio, which make it quite attractive for practical implementations [384]. To assess the representativity of our choice, we also determined transducer power upper bound values for: 32-bit Wallace Tree Multiplier, 32-bit Dadda Tree Multiplier, 64-bit Brent Kung Adder, 64-bit Dadda Tree Multiplier, 4-operand 64-bit Han-Carlson adder, 4-operand 64-bit Carry Skip Adder, 32-bit Multiply Accumulate, 32-bit Divider, 17-bit Galois-Field Multiplier, and 32-bit Cyclic redundancy check. Our results indicate that Brent Kung Adder requires the lowest transducer upper bound (worst case), therefore, our choice as discussion vehicle is relevant for the purpose of this analysis. Figure 9.2: 8-bit Brent-Kung prefix adder based on Majority gate [385]. #### **9.1.1.** Possible Implementations We evaluate different 32-bit BKA SW implementations based on Majority gates and compare them with the 7 nm CMOS implementation. Note that all our SW circuits rely on the majority tailored implementation method introduced in [385], and depicted in Figure 9.3, for the 8-bit BKA case. As previously mentioned SW gate cascading is not straightforward [352] and to this end we evaluated 32-bit BKA implementations built with: (i) Ideal gate cascading (S1), (ii) Normalizers after each logic gate (S2), (iii) Normalizers and signal conversion back and forth between the electrical and spin wave domain (S3), and (iv) All-in-SW approach (S4). Note that in the implementations, we utilized a combination of fanout enabled ladder shaped Majority gate, programmable logic gates [77], [78], [376], triangle shape Majority gate [347], in-line Majority gate [386], and normalizers (directional couplers) [352]. Regardless of the gate cascading method the 32-bit BKA requires 98 transducers as it has 65 inputs and 33 outputs. This is the case for S1, which assumes direct SW gate cascading, i.e., no normalizers or signal conversion between electrical and SW domain are required to build the adder, and provides the best possible but practically unachievable adder performance. S2, which provides practically achievable performance data, makes use of directional couplers to normalize SW gate outputs. Figure 9.3 presents its Figure 9.3: 32-bit SW Brent Kung Prefix Adder Using Normalizers Only. structure, which contains 1040 transducers as a result of gate replication induced by: 1) unavailability of SW gates with larger than 4 fanout, 2) unavailability of SW splitters and amplifiers, and 3) layout limitations (waveguides crossovers are not allowed). For example, Figure 9.4 presents the SW circuit for calculating the carry-outs C1, C2, C3, ..., C9. As it can be observed from the Figure, C1 to C7 are calculated using the SW circuit in Figure 9.4 a), which requires 17 excitation transducers. On the other hand, C8 to C9 are detected using the SW circuit in Figure 9.4 b), which requires 23 excitation transducers, where 9 transducers are replicated because of fanout limitations. S3 diminishes the number of required replication and Figure 9.5 presents the SW circuit for calculating C1, C2, C3, ..., C9 by utilising normalizers and domain conversion (SW to/from electrical). This implementation requires a total of 43 transducers including excitation, intermediate, and detection transducers, whereas S2 implementation requires 49, thus we save 6 transducers for the calculation of the first 9 carry-outs. However, as back and forth domain conversion cost is not yet available, the actual advantage of S3 cannot be accu- rately assessed. S4 implementation makes use of normalizers, splitters, amplifiers, and enables line crossover and its structure depicted in Figure 9.6 makes use of 98 transducers (65 excitation and 33 detection transducers), 72 splitters (directional couplers), and 72 amplifiers. #### 9.1.2. Brent-Kung Adder Choice As stated in the introduction, we determined transducer power upper bound values for: 32-bit Wallace Tree Multiplier, 32-bit Dadda Tree Multiplier, 64-bit Brent Kung Adder, 64-bit Dadda Tree Multiplier, 4-operand 64-bit Han-Carlson adder, 4-operand 64-bit Carry Skip Adder, 32-bit Multiply Accumulate, 32-bit Divider, 17-bit Galois-Field Multiplier, and 32-bit Cyclic redundancy check, in order to choose the worst case circuit which requires the lowest transducer upper bound. In order to do so, the SW implementations should exhibit a maximum energy consumption $E_{SW} < E_{CMOS}$ in comparison with the 10 nmCMOS implementations in [387], and based on this, we can determine the performance constraint that the transducer needs to fulfil. In order to find the lowest transducer upper bound circuit, we estimated the circuits in the most promising SW approach which is S4 implementation. To evaluate the delay of a spin wave S4 implementation for the aforementioned circuits, we have to estimate their critical paths, evaluate their physical lengths and determine the number of transducers they contain. Considering the ladder shape Majority (and their programmable logic gate version) gates [77], [78], [376] and assuming that the maximum propagation length per Majority gate is 336 nm, we can evaluate the length of the input SWs trajectory towards outputs in each implementation. We estimated the following critical path length for the CRC32, BKA64, GFMUL, CSA464, HCA464, WTM32, MAC32, DTM32, DIV32, and DTM64, respectively: $8.33~\mu\text{m}$ , $8.33~\mu\text{m}$ , $9.07~\mu\text{m}$ , $9.33~\mu\text{m}$ , $9.33~\mu\text{m}$ , $13.7~\mu\text{m}$ , $14.7~\mu\text{m}$ , $11.7~\mu\text{m}$ , $45.7~\mu\text{m}$ , and $12.7~\mu\text{m}$ . To derive the actual critical path delay, the SW propagation speed is required, which equals the SW group velocity that can be obtained from the dispersion relation material specific slope. Based on the critical path length and SW group velocity, we calculated the delay of the different circuits' implementation based on CoFeB waveguide as this material provides the highest SW group velocity. In addition, the following assumptions were made for the delay of the separate elements: 0.42 ns transducer delay [348], and a 20 ns normalizer delay [352]. Based on this, we derive the following overall delays for the CRC32, BKA64, GFMUL, CSA464, HCA464, WTM32, MAC32, DTM32, DIV32, and DTM64, respectively: 25 ns, 25 ns, 27.2 ns, 28 ns, 28 ns, 41 ns, 44 ns, 35 ns, 137 ns, and 38 ns. To proceed with the investigation on the SW circuits' energy consumption, we concentrate on power consumption estimation. Assuming 0 power SW propagation through waveguides as SW doesn't require electron movement and just electron spinning, we can estimate the energy consumption as $E_{SW} = TN \times PT \times Delay$ , where TN is the number of transducers in the circuit implementation, PT the power consumed by one transducer, and Delay the time necessary to excite a SW. Given that in order to outperform CMOS $E_{SW} < E_{CMOS}$ , the transducer power consumption upper bound can be determined as $PT = E_{CMOS}/(TN \times Delay)$ . TN is determined by circuit topology and for each circuit we account one per primary input, and one per primary output, which results in 238, | Circuits' Implementations | Maximum Power (nW) | | | | |---------------------------|---------------------------------|----------------------------|--|--| | | Continuous Mode Operation (CMO) | Pulse Mode Operation (PMO) | | | | CRC32 | 11 | 668 | | | | BKA64 | 4.9 | 290 | | | | GFMUL | 17.8 | 1144 | | | | CSA464 | 7.5 | 502 | | | | HCA464 | 13 | 852 | | | | WTM32 | 13.5 | 1317 | | | | MAC32 | 15.4 | 1611 | | | | DTM32 | 16 | 1350 | | | | DIV32 | 65 | 21300 | | | | DTM64 | 17 | 1519 | | | Table 9.1: Transducer Power Upper Bound. 220, 144, 5600, 144, 475, 3743, 3783, 3368, 8384, and 12697 transducers for the CRC32, BKA64, GFMUL, CSA464, HCA464, WTM32, MAC32, DTM32, DIV32, and DTM64, respectively. It was assumed that each amplifier consumes $\sqrt{n}$ , where n is the amplification level. The actual $E_{SW}$ value is dependent on the SW operation mode, which defines the Delay value in its evaluation expression. In Continuous Mode Operation (CMO) [377] the transducers are active as long as the SWs are propagating through the circuit, i.e., from SW excitation till the output detection. This means that Delay equals the overall circuits' delay, i.e., 25 ns, 25 ns, 27.2 ns, 28 ns, 28 ns, 41 ns, 44 ns, 35 ns, 137 ns, and 38 ns for the CRC32, BKA64, GFMUL, CSA464, HCA464, WTM32, MAC32, DTM32, DIV32, and DTM64, respectively. In Pulse Mode Operation (PMO) [377], transducers are active only for a very short period of time required to initiate their output, which we assume to be 0.42 ns for all the circuits' implementation. Based on this reasoning, we determined the maximum allowable transducer power consumption PT for the CoFeB implementations under CMO and PMO scenarios for the CRC32, BKA64, GFMUL, CSA464, HCA464, WTM32, MAC32, DTM32, DIV32, and DTM64, respectively, as presented in Table 9.1. As one can observe in the Table, in both CMO and PMO, the lowest transducer upper bound circuit is the Brent-Kung adder. Therefore, we choose to outline the BKA circuit, analyze it precisely based on the four implementations S1, S2, S3, and S4, and compare it with the state-of-the-art 7 nm to determine the SW transducer upper bound which makes SW technology outperform 7 nm. BKA outline, analysis, and comparison are demonstrated in the following subsection. #### 9.1.3. Transducer Power Upper Bound To determine the transducer power consumption upper bound we first need to estimate the power and delay of our reference, i.e., CMOS 32-bit BKA. For this we utilize a commercial state-of-the-art 7 nm FinFET technology, with regular threshold voltage standard cells, and typical process corner ( $V_{DD}$ =0.7 V, T=25°C). The adder was evaluated by means of Cadence simulation, which reported a power consumption of 2.58 $\mu$ W and a delay of 1.033 ns that translates to an energy consumption of approximately 2.67 fJ for the 7 nm CMOS 32-bit BKA. In order to outperform the 7 nm CMOS BKA, the SW implementations should exhibit a maximum energy consumption $E_{SW} < E_{CMOS}$ , and based Figure 9.4: a) Carry-out1 to Carry-out7 Calculation using Normalizers only, b) Carry-out8 to Carry-out9 Calculation using Normalizers only. on this, we can determine the performance constraint that the transducer needs to fulfil. To evaluate the delay of a spin wave implementation, we have to identify its critical path, evaluate its physical length and determine the number of transducers it contains as explanined in the previous subsection. First, we note that the critical path encompasses 16 Majority gates for S1, S2, S3, and S4. Based on this we derive the following critical path lengths: (i) $5.4\,\mu\mathrm{m}$ for S1, (ii) $50\,\mu\mathrm{m}$ for S2, (iii) $43\,\mu\mathrm{m}$ for S3, and (iv) $85\,\mu\mathrm{m}$ for S4. Although S3 has the shortest critical path length because it includes the least amount of directional couplers, it does not have the shortest delay because of the domain conversion circuitry. S4 has the longest critical path length because it make use of amplifiers and splitters to avoid transducer replications. To derive the actual critical path delay, and based on the critical path length and SW group velocity, we calculated the delay of the 4 implementations based on CoFeB waveguide as this material provides the highest SW group velocity as stated in the previous subsection. In addition, the following assumptions were made for the delay of the separate elements: 0.42 ns transducer delay [348], a 20 ns normalizer delay [352], a 0.03 ns peripheral circuit for the converters [348]. Based on this, we derive the following overall delays: (i) 1.92 ns for S1, (ii) 14.3 ns for S2, (iii) 20 ns for S3, and (iv) 24.3 ns for S4. To proceed with the investigation on the SW adder energy consumption, we con- Figure 9.5: Carry-out1 to Carry-out9 Calculation using Normalizers and Converters. centrate on power consumption estimation with the aforementioned assumption. At this case, TN is determined by circuit topology and for each design we account one per primary input, one per primary output, and (if the case) the appropriate number of repeaters or converters necessary to interconnect the gates forming the prefix adder circuit, which results in 98, 1040, 262, and 203 transducers for S1, S2, S3, and S4, respectively. It was assumed that each amplifier consumes $\sqrt{n}$ , where n is the amplification level. The actual $E_{SW}$ value is dependent on the SW operation mode, which defines the Delay value in its evaluation expression as mentioned previously. In Continuous Mode Operation (CMO) [377] the transducers are active as long as the SWs are propagating through the circuit, i.e., from SW excitation till the output detection. This means that Delay equals the overall adder delay, i.e., 20 ns, 12.3 ns, and 24.3 ns for S2, S3, and S4, respectively. In Pulse Mode Operation (PMO) [377], transducers are active only for a very short period of time required to initiate their output, which we assume to be 0.42 ns for all the implementations. Based on this reasoning, we determined the maximum allowable transducer power consumption PT for the CoFeB implementations under CMO and PMO scenarios as presented in Table 9.2. As one can observe in the Table, CMO puts a high pressure on the transducer performance whereas PMO relaxes it by 1-2 orders of magnitude. Moreover, regardless of the operation mode, the hybrid-based implementation is the most energy effective and allows for the highest PT value. Therefore, our preliminary evaluation indicates that the hybrid-based pulse mode operation approach potentially allows spin wave technology to outperform 7 nm CMOS, assuming that transducers with maximal 31 nW power consumption are achievable. Figure 9.6: 32-bit SW Brent Kung Prefix Adder Using Hybrid Approach. Table 9.2: Transducer Power Upper Bound. | Implementation | Maximum Power (nW) | | | | |--------------------------|---------------------------------|----------------------------|--|--| | | Continuous Mode Operation (CMO) | Pulse Mode Operation (PMO) | | | | Ideal Case | 17 | 64.9 | | | | Normalizer | 0.18 | 6.1 | | | | Normalizer and Converter | 0.51 | 24 | | | | Hybrid | 0.54 | 31 | | | #### 9.2. CONCLUSIONS Finally, we assessed Magnonic circuits potential to outperform functionally equivalent CMOS counterparts in terms of energy consumption. We based our analysis on the fact that SW circuits energy consumption is determined by the energy spent by transducers to generate the input SWs and sense the output SWs, as SWs propagation through ferromagnetic waveguides do not consume noticeable energy. While it has been suggested that magneto-electric (ME) cells would be capable to excite and detect SWs while consuming ultra-low power, they have not been experimentally demonstrated and no figures of merit are available. Thus instead of performing a traditional benchmarking we carried on a reverse engineering investigation in an attempt to determine the ME power consumption upper bound that still make Magnonic circuits outperform CMOS counterparts. To this end, we assumed a 32-bit Brent-Kung prefix adder as discussion vehicle and determined the maximum transducer power consumption that still make the SW implementation outperform its 7 nm CMOS counterpart. We evaluated different SW implementations that rely on conversion- or normalization-based gate cascading and un- der continuous or pulse SW generation scenarios. Our evaluations indicated that $31\,\mathrm{nW}$ is the maximum transducer power consumption for which the 32-bit Brent-Kung SW implementation outperforms its 7 nm CMOS counterpart in term of energy. Moreover, we identified the challenges ahead towards the design and realization of energy effective SW circuits and computation platforms. ### **CONCLUSIONS** #### 10.1.SUMMARY #### 10.2. FUTURE RESEARCH DIRECTIONS In this thesis, we enabled fanout in spin wave logic gates and gate cascading without domain conversion, which opens the way toward designing efficient circuit in the spin wave domain. In addition, we made use of the spin wave characteristics to achieve parallelism and wavepipelining in spin wave, which saved area and increased the throughput. Furthermore, we introduced the approximate computing in SW domain, and designed multiple approximate SW circuits, which saved much energy and area in the errortolerant applications. Moreover, we went beyond Boolean algebra and introduced a nonbinary SW computing paradigm that enables full non-binary SW circuit design, and made use of it to design a SW non-binary adder. Finally, we determined the maximum transducer power consumption for which the SW implementation out-performs in terms of energy its 7 nm CMOS counterpart. In conclusion, this chapter summarizes the results of this dissertation. First, it summarizes the conclusions of each chapter. After that, it provides future research directions in the SW technology. 154 10. Conclusions #### **10.1. SUMMARY** • Chapter 1: Introduction This chapter presented the motivation beyond utilizing spin wave for circuit design as CMOS downscaling become more and more difficult recently, and the motivation to explore the nonboolean computing paradigm which CMOS is not good in implementing it. Then we explained the main strength of spin wave as it consumes ultra-low energy, has acceptable delay, and highly scalable. After that, we presented briefly the state-of-the-art, and explained the main drawback of them which can be summarized as follows: (i) lacking fanout capability, (ii) disregarded the fact that spin wave gate cascading is not straightforward, (iii) some designs were not scalable and consumed large energy. Next, we discussed the spin wave circuit design challenges, and identified energy conversion free SW gate interconnect and fan-out achievement as the main hurdles ahead towards the realization of competitive magnonic circuits able to interoperate with or even replace CMOS counterparts. Then we formulated a general research question followed by 8 sub-questions to investigate the possibility of building scalable energy efficient spin wave circuits. After that, we summarized the thesis contribution followed by the thesis organization. - Chapter 2: Background and State-of-the-art In this chapter, we presented spin wave based computing paradigm, its promise and associated challenges. To provide the necessary theoretical background we first discussed the SW creation as a collective spin excitation within a ferromagnetic material by means of an external magnetic field. Subsequently, we introduced the SW based computing basic principles, possible ways for information encoding and processing, and demonstrated that SW interaction provides natural means for Majority gate and Inverter realizations, which form together a Universal Gate Set. Afterwards, we discussed the generic organization of any SW based gate. Finally, we provided an overview of the state-of-the-art SW designs. - Chapter 3: Fanout Enable Spin Wave Majority Gates In this chapter, we first introduced novel ladder and triangle shape spin wave majority gate devices that can achieve a fan-out of up to 4 and 2, respectively, and discussed how the ladder Majority can serve as a programmable logic gate and the triangle one as an XOR gate. The proposed designs were validated by means of OOMMF and MuMax3 micromagnetic simulations and compared with the state-of-the-art spin wave and 16 nm CMOS, counterparts. Our evaluation indicated that while 14x slower than the CMOS counter-part, the proposed ladder and triangle structures gate provided 9x and 10.5x energy consumption reduction, respectively. Moreover, due to their fanout capabilities, they also provided a 33% and 50% energy reduction, respectively, when compared with state-of-the-art SW gates, without inducing any area or delay overhead. - **Chapter 4: Spin Wave Data Parallelism** A novel *n*-bit data parallel spin wave logic gate was proposed in this chapter. In order to explain the proposed concept, we implemented and validated by means of OOMMF, 8-bit 2-input XOR and 3-input Majority gates. Further, we proposed an optimization algorithm to minimize the 10 10.1. SUMMARY 155 area overhead of the proposed multi-frequency gates and demonstrated that the algorithm diminishes the area by 30% and 41% for XOR and MAJ gates implementations, respectively. Moreover, to asses the potential of our proposal, we evaluated and compared the proposed multifrequency gates with functionally equivalent scalar SW gate based implementations in terms of area, delay, and power consumption. The results indicated that the byte-based XOR and Majority gates required 4.47x and 4.16x area less than the conventional (scalar) implementations, respectively, at the expense of 5% to 7% delay overhead and without inducing any power consumption overhead. Finally, we demonstrated that, for current gate topology and materials, the maximum number of frequencies (gate parallelism) is 8 and 16 for phase and threshold based output detection, respectively. - Chapter 5: Spin Wave Wavepipeline Chapter 5 presented the proposed SW 3-input Majority gate under continuous and pulse mode operation regimes and its validation by means of micromagnetic simulations. We evaluated the gate energy consumption, and our results indicated that Pulse Mode Operation (PMO) diminishes the gate energy consumption by a factor of 18, when compared with the Continuous Mode Operation. In addition, we presented how PMO enables Wave Pipelining (WP) within SW circuits and validated WP on a 4 cascaded 3-input Majority gates circuit by means of micromagnetic simulations. Furthermore, we demonstrated that WP utilization improved the circuit throughput by 3.6x. - Chapter 6: Signal Renormalization A novel energy efficient spin wave based Full Adder (FA) was proposed in this chapter. The FA was implemented by making use of a Majority gate and 2 XOR gates. In the proposed FA, two main detection mechanisms were utilized: phase detection for the Carry-out output detection and threshold detection for the Sum output detection. The correct functionality of the FA was validated by means of micromagnetic simulations and it was evaluated and compared with direct SW gate based implementation and five state-of-the-art technologies equivalent designs 22 nm CMOS, MTJ, SHE, DWM and Spin-CMOS. It was demonstrated that the proposed FA consumed 22.5%, and 43% less energy than direct SW gate based implementations and 22 nm CMOS, respectively and saved more than 3 orders of magnitude in comparison with the state-of-the-art MTJ, SHE, DWM and Spin-CMOS based FA. Also, the proposed FA needed more than 22% less area in comparison with all designs. Subsequently, we proposed and validated by means of micro-magnetic simulation a novel 4-2 Spin Wave (SW) compressor. The proposed compressor was assessed and compared with the state-of-the-art SW, 22 nm CMOS, Magnetic Tunnel Junction (MTJ), Domain Wall Motion (DWM), and Spin-CMOS technologies. The evaluation result showed that the proposed compressor consumed 2.5x less energy than 22 nm CMOS counterpart. In addition, it outperformed the MTJ, DWM, and Spin-CMOS designs by at least 3 orders of magnitude. Moreover, it consumed 1.25x less energy than the conventional SW compressor. Furthermore, it achieved the smallest chip real-estate. Finally, we introduced a directional coupler-based SW amplitude renormalization method, which allows for conversion free energy effective gate cascading. Three 10 156 10. CONCLUSIONS complex gates, that cover the most common situations encountered in logic circuit implementations, and a 2-bit inputs spin wave multiplier had been presented and validated by means of micromagnetic simulations. Our results indicated that they were energy effective and potentially open the road towards the full utilization of SW paradigm capabilities and the development of SW only circuits. In particular, for the complex gates our method provided 20%-33% energy savings when compared with conversion based equivalent designs, and the proposed SW multiplier required 6.25× and 31% less energy in comparison with the 16 nm CMOS and conversion-based SW counterparts, respectively. • Chapter 7: Spin Wave Approximate Computing We proposed and validated by means of micromagnetic simulations a novel approximate energy efficient spin wave based Full Adder (AFA), and was evaluated and compared with the state-of-the-art counterparts. AFA saved 43% and 33% energy when compared with the state-of-the-art SW and 7 nm CMOS, respectively, and 69% and 44% in comparison with accurate and approximate 45 nm CMOS, respectively. In addition, it saved more than 2 orders of magnitude when compared with accurate SHE, and accurate and approximate DWM, MTJ, and Spin-CMOS FAs. Moreover, it achieved the same error rate as approximate 45 nm CMOS and Spin-CMOS FA whereas it exhibited 50% less error rate than approximate DWM FA and required at least 29% less chip real-estate in comparison with the other state-of-the-art designs. In addition, we introduced a a Spin Wave (SW) based 4:2 approximate compressor, which consisted of 3-input and 5-input Majority gates. We reported the design of approximate circuits without directional couplers, which are essential to normalize gate output(s) when cascading them in accurate circuit designs. We validated the proposed compressor by means of micromagnetic simulations, and compared it with the state-of-the-art SW, 22 nm CMOS, 45 nm CMOS, and Spin-CMOS counterparts. The evaluation results indicated that the proposed 4:2 compressor saved 31.5% energy in comparison with the accurate SW compressor, had the same energy consumption, and error rate as the approximate compressor with DC, but it required 3x less delay. Moreover, it consumed 14% less energy, while having 17% lower error rate when compared with the approximate 45 nm CMOS counterpart. Furthermore, it outperformed the approximate Spin-CMOS based compressor by 3 orders of magnitude in term of energy consumption while providing the same error rate. Last but not least, the proposed compressor required the smallest number of devices, thus it potentially requires the lowest chip real-estate. Finally, we proposed and validated, by means of micromagnetic simulations, a 2-bit inputs multiplier (AMUL). It was evaluated and compared with the state-of-the-art counterparts. AMUL energy consumption was at least 2.5x smaller the one of state-of-the-art accurate SW designs and 16 nm CMOS accurate and approximate designs. Moreover, AMUL exhibited an error rate of 25%, while the approximate CMOS MUL one of 38%, and required at least 64% less chip real-estate. Chapter 8: Non-binary Computing In this chapter we introduced a novel non Boolean algebra based computation paradigm, which enables domain conversion free ultra-low energy consumption SW based computing. Subsequently, we leveraged this computing paradigm by designing a non-binary spin wave adder, which we validated by means of micro-magnetic simulation. To get more inside on the proposed adder potential we assumed a 2-bit adder implementation as discussion vehicle, evaluated its area, delay, and energy consumption, and compared it with conventional SW and 16 nm CMOS counterparts. The results indicated that our proposal diminished the energy consumption by a factor of 3.14x and 37x, when compared with the conventional SW and 16 nm CMOS functionally equivalent designs, respectively. Furthermore, the proposed non-binary adder implementation required the least number of devices, which indicated SW potential for the realization of small chip real-estate beyond state-of-the-art circuits and computation platforms. Chapter 9: Initial Benchmarking of Spin Wave Technology In this chapter, we assessed Magnonic circuits potential to outperform functionally equivalent CMOS counterparts in terms of energy consumption. We based our analysis on the fact that SW circuits energy consumption is determined by the energy spent by transducers to generate the input SWs and sense the output SWs, as SWs propagation through ferromagnetic waveguides do not consume noticeable energy. While it has been suggested that magneto-electric (ME) cells would be capable to excite and detect SWs while consuming ultra-low power, they have not been experimentally demonstrated and no figures of merit are available. Thus instead of performing a traditional benchmarking we carried on a reverse engineering investigation in an attempt to determine the ME power consumption upper bound that still make Magnonic circuits outperform CMOS counterparts. To this end, we assumed a 32-bit Brent-Kung prefix adder as discussion vehicle and determined the maximum transducer power consumption that still make the SW implementation outperform its 7 nm CMOS counterpart. We evaluated different SW implementations that rely on conversion- or normalization-based gate cascading and under continuous or pulse SW generation scenarios. Our evaluations indicated that 31 nW is the maximum transducer power consumption for which the 32-bit Brent-Kung SW implementation outperforms its 7 nm CMOS counterpart in term of energy. Moreover, we identified the challenges ahead towards the design and realization of energy effective SW circuits and computation platforms. #### **10.2.** FUTURE RESEARCH DIRECTIONS Yet, several obstacles still exist on the road towards the realization of competitive spin wave computing systems. In the following, we present our view on the most critical hurdles. For a number of these obstacles, potential solutions have been proposed but need to be demonstrated and properly assessed in terms of energy and delay overhead, while others have been less addressed in the research literature so far. **Interconnect.** To fulfil the SW promise and build magnonic circuits, effective solutions for normalizers, fanout, splitters, amplifiers, enabling cross two lines, and enabling multi-layer designs are required. Although the normalizer problems were solved in this thesis, but more efficient directional couplers or other solutions would be beneficial. In addition, fanout of 4 Majority gates and programmable logic gates were realised in this thesis, which is sufficient for many circuits, but larger fanout capability would further reduce the replications in the circuits. Although fanout was enabled at the gate level, which benefits the SW circuits, fanout capability at the circuit level is still needed. This could potentially be achieved by adding a splitter and an amplifier that amplifies the SW to nA before the splitter. However, efficient experimental splitters and amplifiers are still to be developed. Although directional coupler (DC) can split the SW amplitude by a factor 2, but other devices which can split the SW by a factor of n are highly desired. In some cases, we need to diminish the SW amplitude by a factor 3, 4, 5, and 6, which is more difficult to realise with a DC. In addition, enabling cross two lines is of great interest to enable building SW circuits without any conversion or replication. Furthermore, enabling multi-layer design helps optimizing the SW circuit design in terms of area as it gives more freedom for the designers. **Transducer efficiency.** A major limitation for all applications of spin waves at the nano scale is the energy efficiency of spin-wave generation and detection. While large mmscale antennas and magnetic waveguides can be efficient to transfer electrical energy into ferromagnetic resonance and the spin-wave system, the radiated power and the efficiency decreases with the magnetic excitation volume. Hence, energy-efficient nanoscale spin-wave transducers are still lacking. From a systems point of view, the relevant energy is the external electric energy needed to excite spin waves and not the energy of the spin waves themselves. Hence, the transducer efficiency is a key property for ultralowpower applications of spin-wave computing systems. Magnetoelectric transducers currently appear to be the most promising. However, energy-efficient spin-wave excitation by magnetoelectric transducers has not been demonstrated experimentally yet. Moreover, research of magnetoelectric devices at the nanoscale and at GHz frequencies is only starting. The physics of the magnetoelectric coupling in nanoscale spin-wave transducers is not yet fully established and is expected to be complicated by the complex acoustic response of "real" non-ideal devices [388]. Here, a major breakthrough would be the demonstration of a scaled (or scalable) efficient spin-wave transducer based on a magnetoelectric compound material. Efficient spin-wave detection is also still challenging. As for generation, the microwave power induced in an antenna decreases with the magnetic volume underneath. To efficiently convert the result of a spin-wave computation to a CMOS-compatible signal, the transducer should ideally generate output signals of about $100\,\mathrm{mV}$ . Such large signals have been typically an issue for many spintronic logic technologies. Magneto-electric transducers may provide a potential solution but the detailed coupling of spin waves to strain and acoustic oscillations in realistic devices has not yet been studied in detail. The demonstration of $\gg 1\,\mathrm{mV}$ output signals in magnetoelectric transducers would certainly be a major breakthrough for spin-wave-based computing as well as for other potential applications. **Device scaling.** As mentioned above, the scaling of the magnetic volume in a spin-wave device reduces the efficiency of transducers, both for generation as well as detec- tion. Scaling device dimensions also has repercussions on the properties of the spin waves themselves. Narrow waveguides exhibit strong internal dipolar magnetic fields due to shape anisotropy. The magnetization is thus preferentially aligned along the waveguide, which means that scaled devices typically operate with backward-volume spin waves. A distinct advantage of this geometry can be the "self-biasing" due to the strong anisotropy field, which does not require external magnetic bias fields. By contrast, the excitation of surface waves requires large external fields to rotate the magnetization transverse to the waveguide, which may not be practical. Device scaling, also, has strong repercussions on the spin-wave group velocity. Reducing the waveguide thickness diminishes the group velocity. Smaller devices, also, require the utilization of backward volume spin waves with shorter wavelengths, with complex effects on the group velocity. Reaching the exchange regime can be advantageous since it reduces the anisotropy of the spin-wave dispersion and increases the group velocity. However, the high frequencies of exchange spin waves in large- $M_{\rm S}$ ferromagnetic materials may impose severe conditions on mixed-signal periphery circuits. The benchmarking of hybrid spin-wave–CMOS systems [389] has indicated that the possibility to design compact majority gates can lead to significant area gains with respect to CMOS circuits. In practice, the benchmark suggests that competitive areas can already be achieved for characteristic dimensions (*i.e.* waveguide width) of the spin-wave circuit of about 50 nm. Such dimensions have been reached experimentally recently [182]. This indicates that scaling the spin-wave wavelength and the device dimensions should not be a major roadblock. However, the scalability of spin-wave devices may be ultimately limited by other effects, such as the dipolar crosstalk or transducer efficiency [390]. **High-throughput computation.** To date, experimental spin-wave logic gates have been operated in the frequency domain using vector network analyzers. In real applications, however, the devices have to be operated in the time domain. For cascading by nanomagnets, clocking schemes enable time-domain operation, but still remain to be developed and benchmarked. Moreover, input-output isolation may be a challenge for such schemes. All-spin-wave cascading schemes may require the use of spin-wave wave packets or solitons. While the time-domain response of spin-wave transmission can be studied via the Fourier transform of the spectral response, excitation, interference, dephasing, and detection of wave packets are not fully understood and remain to be studied experimentally. Electric crosstalk between transducers is a major issue for nanoscale spin-wave devices due to the low efficiency of spin-wave generation and detection. More efficient transducers may facilitate such experiments. A major breakthrough would be a time-resolved spin-wave transmission experiment with phase sensitivity. Note that high-throughput applications require single pulse operation. **CMOS** periphery circuits. In hybrid spin-wave–CMOS systems, spin-wave circuits are embedded in mixed-signal CMOS-based periphery circuits that provide a link with cache / memory and input/output devices. However, only very few studies have been reported on concrete periphery circuits [65], [201], [348], [391]. The design of periphery circuits is currently hindered by the lack of equivalent circuit models for spin-wave devices and 160 10. CONCLUSIONS transducers. The development of calibrated compact models [392] for a complete set of spin-wave devices and transducers is thus a key first step towards the development of low-power periphery circuits and complete hybrid systems. This is an important *conditio sine qua non* for an accurate benchmark of the performance of hybrid spin-wave–CMOS systems and ultimately for a final assessment of their potential in commercial applications. **New materials.** Spin-wave computing is, also, an interesting field for material scientists. Many spin-wave experiments have been performed using single-crystal YIG. Epitaxy of high quality YIG on Si (100) has not been demonstrated and thus YIG is incompatible with integration alongside CMOS. Ferromagnetic metals, such as CoFeB or permalloy, are routinely integrated in MRAM memory cells and are compatible with Si technology. Nonetheless, insulating ferrites remain an interesting alternative since they typically show lower losses at microwave frequencies. However, thin ferrite films with low damping that can be cointegrated with Si-based CMOS still have to be demonstrated. Magnetoelectric compound materials are also a fascinating research field in material science. Challenges include the combination of Pb-free high-performance piezoelectrics and ferromagnets with large magnetostriction coefficients and low damping. In particular the piezoelectric response at GHz frequencies is often limited due to dielectric and ferroelectric relaxation, although some progress has recently been reported [393]. The above discussion indicates that many obstacles still exist before spin-wave technology can lead to competitive computing systems. However, this thesis clearly established the promise of such a technology for ultralow-power electronics. The large-scale effort in magnonic research will certainly advance the state-of-the-art further in the near future. Hence, one can anticipate that spin-wave circuits will become a reality in the next decade. The remaining obstacles relate to their embedding into the CMOS periphery, including transduction. This field requires close collaboration between researchers in spin-wave physics as well as device and circuit design. Physics-based compact models of spin-waves devices and transducers [392] may enable circuit simulation, periphery design, and ultimately the refinement of the benchmarking procedure to embolden the promises of spin-wave technology. ## **NOMENCLATURE** # **Symbols** $\mu$ Net magnetic dipole moment V Volume au Lifetime *Pd* Propagation distance $\lambda$ Wavelength $\mu_0$ Vacuum permeability H Magnetic field strength $H_{ext}$ External magnetic field $\mathcal{E}_Z$ Zeeman energy density $H_d$ Demagnetization field u Easy axis $\zeta$ Magnetization direction K anisotropy constant $H_{eff}$ Effective magnetic field $A_{ex}$ Exchange stiffness constant $\Delta$ Laplace operator $\lambda_{ex}$ Exchange constant $\gamma$ Gyromagnetic ratio $M_0$ Static magnetization component $\omega$ Angular frequency k Wavenumber H<sub>0</sub> Static component of the effective magnetic field h Dynamic component of the effective magnetic field $egin{array}{lll} v_p & & ext{Phase speed} \\ d & & ext{Thickness} \\ n & & ext{Mode number} \\ \end{array}$ Angle between the magnetization and the normal to the waveg- uide $\theta_m$ Angle between the magnetization and the longitudinal waveg- uide axis $\delta$ Spin-wave attenuation length $M_z$ Out-of-plane component of magnetization F Frequency Amplitude $f_o(k_x)$ Isolated spin wave waveguide dispersion relation $f_{s,as}(k_x)$ Symmetric and asymmetric dispersion relations for spin waves in coupled waveguides w Waveguide width $\overset{\wedge}{F}_{k_{\mathfrak{X}}}$ Tensor Fourier transform of the spin wave profile across the waveguide $\sigma$ , width $\tilde{w}$ Normalized mode profile constant $L_c$ Coupling length $L_w$ Length of the coupled waveguide $T_{kx}$ Spin wave nonlinear frequency shift DW Distance between waveguides TN Number of transducers in the implementation *PT* Power consumed by one transducer E Energy ## **Acronyms** CMOS Complementary Metal-Oxide-Semiconductor FINFET Fin-shaped Field-Effect Transistor Cu Copper IRDS International Roadmap for Devices and Systems SW Spin Wave MHM Magnonic Helographic Memory IC Integrated Circuit OOMMF Obect Oriented Micromagnetic Framework MuMax GPU-accelerated micromagnetic simulation program MTJMagnetic Tunnel JunctionDWMDomain Wall MotionSHESpin Hall EffectBKABrent-Kung Adder *LLG* Landau—Lifshitz—Gilbert SSW Surface Spin Waves BVSW Backward Volume Spin Waves FVSW Forward Volume Spin Waves YIG Yttrium Iron Garnet Py Permalloy CoFeB Cobalt Iron Boron STT Spin Transfer Torque SOT Spin Orbit Torque ME Magneto Electric VCMA Voltage Controlled Magnetic Anisotropy AC Alternating Current DC Direct Current FR Functional Region BLS Brillouin Light Scattering SPEELSC Spin-Polarized Electron Energy Loss Spectroscopy WP Wave Pipeline FA Full Adder MUL Multiplier AFA Approximate Full Adder AMUL Approximate Multiplier $C_{in}$ Carry-in $C_o$ Carry-out S Sum O Output SRAM Static Random Access Memory MAJ3 3-input Majority Gate PLG Programmable Logic Gate PD Phase Detection TD Threshold Detection PMA Perpendicular Magnetic Anisotropy MSA Magnetization Spinning Angle FO2 Fanout of 2 FO4 Fanout of 4 FFT Fast Fourier TransformCMO Continuous Mode OperationPMO Pulse Mode Operation WG Waveguide PPA Parallel Prefix Adder CLA Carry-Look Ahead adder MRAM Magnetic Random Access Memory # **CURRICULUM VITÆ** ## Abdulqader Mahmoud 14-06-1992 Born in Kalba, United Arab Emirates (UAE). #### **EDUCATION** 2010 | 2018–2022 | Ph.D. degree in Computer Engineering Delft University of Technology (TU Delft), Delft, The Netherlands Thesis: Spin Wave Circuit Design Promotors: Said Hamdioui, Sorin Cotofana | | | | | |-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--| | 2015–2017 | M.Sc. degree in Electrical and Computer Engineering Khalifa University, Abu Dhabi, UAE Thesis: Design and Optimization of Charge Pump for Energy Harvesting Applications | | | | | | | Supervisors: Baker Mohammad, Mohammed ElNaggar, Hani<br>Saleh | | | | | | 2010–2015 | c. degree in Electrical and Electronics Engineering<br>lifa University, Sharjah, UAE | | | | | | AWARDS | | | | | | | 2017 | "Best in Session Award" at SRC Techcon 2017 | | | | | | 2013 | Outstanding Student Award, Khalifa University, UAE | | | | | | 2012 | Outstanding Student Award, Khalifa University, UAE | | | | | | 2011 | Freshman Tutor Award, Khalifa University, UAE | | | | | Outstanding Student Award at UAE Secondary Schools ## LIST OF PUBLICATIONS #### INTERNATIONAL CONFERENCES - 8. **A. Mahmoud**, N. Cucu-Laurenciu, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana, and S. Hamdioui, *Would Magnonic Circuits Outperform CMOS Counterparts?*, In Proceedings of the Great Lakes Symposium on VLSI 2022 (GLSVLSI '22), June 6–8, 2022, Irvine, CA, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3526241.3530368 - 7. **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana and S. Hamdioui, *Spin Wave Based 4-2 Compressor*, 2021 28th IEEE International Conference on Electronics, Circuits, and Systems (ICECS), 2021, pp. 1-4, doi: 10.1109/ICECS53924.2021.9665499. - 6. **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana and S. Hamdioui, *Spin Wave Based Full Adder*, 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1-5. - A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui and S. Cotofana, *Achieving Wave Pipelining in Spin Wave Technology*, 2021 22nd International Symposium on Quality Electronic Design (ISQED), 2021, pp. 54-59. - 4. **A. Mahmoud**, C. Adelmann, F. Vanderveken, S. Cotofana, F. Ciubotaru and S. Hamdioui, *Fan-out of 2 Triangle Shape Spin Wave Logic Gates*, 2021 Design, Automation and Test in Europe Conference and Exhibition (DATE), 2021, pp. 948-953. - 3. **A. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui and S. Cotofana, *4-output Programmable Spin Wave Logic Gate*, 2020 IEEE 38th International Conference on Computer Design (ICCD), 2020, pp. 332-335. - A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana and S. Hamdioui, 2-Output Spin Wave Programmable Logic Gate, 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2020, pp. 60-65. - A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana and S. Hamdioui, nbit Data Parallel Spin Wave Logic Gate, 2020 Design, Automation and Test in Europe Conference and Exhibition (DATE), 2020, pp. 642-645. #### **INTERNATIONAL JOURNALS** - 7. **A. N. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui, and S. Cotofana , *Non-binary Spin Wave Based Circuit Design*, in IEEE Transactions on Circuits and Systems I: Regular Papers, 2022. - A. N. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui and S. Cotofana, A Spin Wave-Based Approximate 4:2 Compressor: Seeking the most energy-efficient digital computing paradigm, in IEEE Nanotechnology Magazine, vol. 16, no. 1, pp. 47-56, Feb. 2022, doi: 10.1109/MNANO.2021.3126095. 168 List of Publications 5. **A. Mahmoud**, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui and S. Cotofana, *Spin Wave Based Approximate Computing*, in IEEE Transactions on Emerging Topics in Computing, doi: 10.1109/TETC.2021.3136299. - A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui and S. Cotofana, *Multifrequency Data Parallel Spin Wave Logic Gates*, in IEEE Transactions on Magnetics, 57, no. 5, pp. 1-12, May 2021, Art no. 3401012. - 3. **A. Mahmoud**, F. Ciubotaru, F. Vanderveken, A. V. Chumak, S. Hamdioui, C. Adelmann, and S. Cotofana, *Introduction to spin wave computing*, Journal of Applied Physics **128**, 161101 (2020). - 2. **A. N. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana and S. Hamdioui, *Spin Wave Normalization Toward All Magnonic Circuits*, in IEEE Transactions on Circuits and Systems I: Regular Papers, **68**, no. 1, pp. 536-549, Jan. 2021. - 1. **A. Mahmoud**, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, *Fan-out enabled spin wave majority gate*, AIP Advances **10**, 035119 (2020). - [1] R. L. Boylestad and L. Nashelsky, *Electronic Devices and Circuit Theory*, 11<sup>th</sup>. Harlow: Pearson, 2014. - [2] S. M. Sze and K. K. Ng, *Physics of Semiconductor Devices*, 3<sup>rd</sup>. Hoboken: Wiley, 2006. - [3] B. Lojek, *History of Semiconductor Engineering*. Berlin, Heidelberg: Springer, 2007. - [4] E.-H. Jiang and W.-B. Jiang, "Theory of expansion boolean algebra and its applications in cmos vlsi digital systems", *Circuits Syst. Signal Process.*, vol. 38, no. 12, pp. 5817–5838, 2019. - [5] J. Uyemura, CMOS Logic Circuit Design. New York: Kluwer, 2007. - [6] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc, "Design of ion-implanted mosfet's with very small physical dimensions", *IEEE J. Solid-State Circuits*, vol. 9, no. 5, pp. 256–268, 1974. - [7] G. E. Moore, "Cramming More Components Onto Integrated Circuits", *Proc. IEEE*, vol. 86, no. 1, pp. 82–85, 1998. - [8] International Technology Roadmap for Semiconductors, http://www.itrs2.net, accessed 31-May-2020. - [9] M. Bohr, "Moore's Law in the innovation era", in *Design for Manufacturability through Design-Process Integration V*, M. L. Rieger, Ed., International Society for Optics and Photonics, vol. 7974, SPIE, 2011, pp. 9–16. - [10] K. J. Kuhn, "Considerations for Ultimate CMOS Scaling", *IEEE Trans. Electron Devices*, vol. 59, no. 7, pp. 1813–1828, 2012. - [11] S. P. Murarka and S. W. Hymes, "Copper metallization for ULSL and beyond", *Crit. Rev. Solid State Mater. Sci.*, vol. 20, no. 2, pp. 87–124, 1995. - [12] M. Bohr, R. Chau, T. Ghani, and K. Mistry, "The High-k Solution", *IEEE Spectrum*, vol. 44, no. 10, pp. 29–35, 2007. - [13] C. Auth, C. Allen, A. Blattner, *et al.*, "A 22 nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors", in *2012 IEEE Symp. VLSI Technol.*, IEEE, 2012, pp. 131–132. - [14] M. M. Waldrop, "The chips are down for Moore's law", *Nature News*, vol. 530, no. 7589, p. 144, 2016. - [15] D. Mamaluy and X. Gao, "The fundamental downscaling limit of field effect transistors", *Appl. Phys. Lett.*, vol. 106, no. 19, p. 193 503, 2015. [16] B. Hoefflinger, *Chips 2020: a guide to the future of nanoelectronics.* Brlin, Heidelberg: Springer, 2012. - [17] N. Z. Haron and S. Hamdioui, "Why is CMOS scaling coming to an END?", in 2008 3rd Intern. Design Test Workshop, IEEE, 2008, pp. 98–103. - [18] *International Roadmap for Devices and Systems*, https://irds.ieee.org, accessed 25-May-2020. - [19] D. G. Feitelson, Optical Computing. Cambridge: MIT Press, 1988. - [20] N. Streibl, K.-H. Brenner, A. Huang, *et al.*, "Digital optics", *Proc. IEEE*, vol. 77, no. 12, pp. 1954–1969, 1989. - [21] J. A. Hutchby, G. I. Bourianoff, V. V. Zhirnov, and J. E. Brewer, "Extending the road beyond cmos", *IEEE Circuits Devices Mag.*, vol. 18, no. 2, pp. 28–41, 2002. - [22] G. Bourianoff, "The future of nanocomputing", *Computer*, vol. 36, no. 8, pp. 44–53, 2003. - [23] G. I. Bourianoff, P. A. Gargini, and D. E. Nikonov, "Research directions in beyond cmos computing", *Solid-State Electron.*, vol. 51, no. 11-12, pp. 1426–1431, 2007. - [24] D. A. B. Miller, "Are optical transistors the logical next step?", *Nature Photon.*, vol. 4, no. 1, pp. 3–5, 2010. - [25] K. Bernstein, R. K. Cavin, W. Porod, A. Seabaugh, and J. Welser, "Device and architecture outlook for beyond cmos switches", *Proc. IEEE*, vol. 98, no. 12, pp. 2169–2184, 2010. - [26] D. E. Nikonov and I. A. Young, "Overview of beyond-cmos devices and a uniform methodology for their benchmarking", *Proc. IEEE*, vol. 101, no. 12, pp. 2498–2533, 2013. - [27] D. E. Nikonov and I. A. Young, "Benchmarking spintronic logic devices based on magnetoelectric oxides", *J. Mater. Res.*, vol. 29, no. 18, p. 2109, 2014. - [28] D. E. Nikonov and I. A. Young, "Benchmarking of beyond-cmos exploratory devices for logic integrated circuits", *IEEE Journal on Exploratory Solid-State Computational Devices and Circuits*, vol. 1, pp. 3–11, 2015. - [29] T. M. Mitchell, *Machine Learning*. Boston: McGraw-Hill, 1997. - [30] K. P. Murphy, Machine Learning: A Probabilistic Perspective. Cambridge, London: MIT Press, 2012. - [31] E. Alpaydin, *Introduction to Machine Learning*, 4th. Cambridge, London: MIT Press, 2020. - [32] K. Boucart and A. M. Ionescu, "Double-gate tunnel fet with high gate dielectric", *IEEE Trans. Electron Devices*, vol. 54, no. 7, pp. 1725–1733, 2007. - [33] D. S. Jeong, K. M. Kim, S. Kim, B. J. Choi, and C. S. Hwang, "Memristors for Energy-Efficient New Computing Paradigms", *Adv. Electron. Mater.*, vol. 2, no. 9, p. 1600 090, 2016. [34] Y. Li, Z. Wang, R. Midya, Q. Xia, and J. J. Yang, "Review of memristor devices in neuromorphic computing: Materials sciences and device challenges", *J. Phys. D: Appl. Phys.*, vol. 51, no. 50, p. 503 002, 2018. - [35] S. A. Wolf, D. D. Awschalom, R. A. Buhrman, *et al.*, "Spintronics: A Spin-Based Electronics Vision for the Future", *Science*, vol. 294, no. 5546, pp. 1488–1495, 2001. - [36] C. Felser and G. H. Fecher, *Spintronics: From Materials to Devices*. Dordrecht, Heidelberg: Springer, 2013. - [37] S. Bandyopadhyay and M. Cahay, *Introduction to Spintronics*, 2<sup>nd</sup>. Boca Raton: CRC Press, 2015. - [38] Y. Xu, D. D. Awschalom, and J. Nitta, Eds., *Handbook of spintronics*. Dordrecht: Springer, 2016. - [39] D. Sander, S. O. Valenzuela, D. Makarov, *et al.*, "The 2017 Magnetism Roadmap", *J. Phys. D: Appl. Phys.*, vol. 50, no. 36, p. 363 001, 2017. - [40] B. Dieny, I. L. Prejbeanu, K. Garello, *et al.*, "Opportunities and challenges for spintronics in the microelectronic industry", *Nature Electron.*, vol. 3, p. 446, 2020. - [41] J. Atulasimha and S. Bandyopadhyay, Eds., *Nanomagnetic and Spintronic Devices* for Energy-Efficient Memory and Computing. Chichester: Wiley, 2016. - [42] S. Manipatruni, D. E. Nikonov, and I. A. Young, "Beyond CMOS computing with spin and polarization", *Nature Phys.*, vol. 14, no. 4, p. 338, 2018. - [43] I. Žutić, J. Fabian, and S. Das Sarma, "Spintronics: Fundamentals and applications", *Rev. Mod. Phys.*, vol. 76, no. 2, pp. 323–410, 2004. - [44] A. A. Khajetoorians, J. Wiebe, B. Chilian, and R. Wiesendanger, "Realizing all-spin-based logic operations atom by atom", *Science*, vol. 332, no. 6033, pp. 1062–1064, 2011. - [45] S. Datta and B. Das, "Electronic analog of the electro-optic modulator", *Appl. Phys. Lett.*, vol. 56, no. 7, pp. 665–667, 1990. - [46] K. C. Hall and M. E. Flatté, "Performance of a spin-based insulated gate field effect transistor", *Appl. Phys. Lett.*, vol. 88, no. 16, p. 162 503, 2006. - [47] H. Dery, P. Dalal, Ł. Cywiński, and L. J. Sham, "Spin-based logic in semiconductors for reconfigurable large-scale circuits", *Nature*, vol. 447, no. 7144, pp. 573–576, 2007. - [48] R. P. Cowburn and M. E. Welland, "Room Temperature Magnetic Quantum Cellular Automata", *Science*, vol. 287, no. 5457, pp. 1466–1468, 2000. - [49] A. Ney, C. Pampuch, R. Koch, and K. H. Ploog, "Programmable computing with a single magnetoresistive element", *Nature*, vol. 425, no. 6957, pp. 485–487, 2003. - [50] A. Imre, G. Csaba, L. Ji, A. Orlov, G. H. Bernstein, and W. Porod, "Majority Logic Gate for Magnetic Quantum-Dot Cellular Automata", *Science*, vol. 311, no. 5758, pp. 205–208, Jan. 2006. - [51] B. Behin-Aein, D. Datta, S. Salahuddin, and S. Datta, "Proposal for an all-spin logic device with built-in memory", *Nature Nanotechnol.*, vol. 5, no. 4, pp. 266–270, 2010. [52] S. Manipatruni, D. E. Nikonov, C.-C. Lin, *et al.*, "Scalable energy-efficient magnetoelectric spin–orbit logic", *Nature*, vol. 565, no. 7737, p. 35, 2019. - [53] D. A. Allwood, G. Xiong, C. C. Faulkner, D. Atkinson, D. Petit, and R. P. Cowburn, "Magnetic Domain-Wall Logic", *Science*, vol. 309, no. 5741, pp. 1688–1692, 2005. - [54] P. Xu, K. Xia, C. Gu, L. Tang, H. Yang, and J. Li, "An all-metallic logic gate based on current-driven domain wall motion", *Nature Nanotechnol.*, vol. 3, no. 2, pp. 97–100, 2008. - [55] D. E. Nikonov, G. I. Bourianoff, and T. Ghani, "Proposal of a Spin Torque Majority Gate Logic", *IEEE Electron Device Lett.*, vol. 32, no. 8, 2011. - [56] X. Zhang, M. Ezawa, and Y. Zhou, "Magnetic skyrmion logic gates: Conversion, duplication and merging of skyrmions", *Sci. Rep.*, vol. 5, p. 9400, 2015. - [57] K. Koumpouras, D. Yudin, C. Adelmann, A. Bergman, O. Eriksson, and M. Pereiro, "A majority gate with chiral magnetic solitons", *J. Phys.: Cond. Matter*, vol. 30, no. 37, p. 375 801, 2018. - [58] M. P. Kostylev, A. A. Serga, T. Schneider, B. Leven, and B. Hillebrands, "Spin-wave logical gates", *Appl. Phys. Lett.*, vol. 87, no. 15, p. 153 501, 2005. - [59] A. Khitun and K. L. Wang, "Non-volatile magnonic logic circuits engineering", *J. Appl. Phys.*, vol. 110, no. 3, p. 034 306, 2011. - [60] A. V. Chumak, V. I. Vasyuchka, A. A. Serga, and B. Hillebrands, "Magnon spintronics", *Nature Phys.*, vol. 11, p. 453, 2015. - [61] A. V. Chumak, A. A. Serga, and B. Hillebrands, "Magnon transistor for all-magnon data processing", *Nature Commun.*, vol. 5, p. 4700, 2014. - [62] K.-S. Lee and S.-K. Kim, "Conceptual design of spin wave logic gates based on a mach–zehnder-type spin wave interferometer for universal logic functions", *J. Appl. Phys.*, vol. 104, no. 5, p. 053 909, 2008. - [63] O. Zografos, M. Manfrini, A. Vaysset, *et al.*, "Exchange-driven Magnetic Logic", *Sci. Rep.*, vol. 7, p. 12154, 2017. - [64] I. P. Radu, O. Zografos, A. Vaysset, *et al.*, "Spintronic majority gates", in *2015 IEEE International Electron Devices Meeting (IEDM)*, IEEE, 2015, p. 32.5. - [65] O. Zografos, A. Vaysset, B. Sorée, and P. Raghavan, "Spin-Based Majority Computation", in *Beyond-CMOS Technologies for Next Generation Computer Design*, R. O. Topaloglu and H.-S. P. Wong, Eds., Cham: Springer, 2019, ch. 7, pp. 231–262. - [66] D. Hampel and R. O. Winder, "Threshold logic", *IEEE Spectrum*, vol. 8, no. 5, pp. 32–39, 1971. - [67] L. Amarù, P.-E. Gaillardon, and G. De Micheli, "Majority-Inverter Graph: A Novel Data-Structure and Algorithms for Efficient Logic Optimization", *Proc.* 51<sup>st</sup> *Design Automation Conf. (DAC)*, pp. 1–6, 2014. - [68] L. Amarù, P.-E. Gaillardon, S. Mitra, and G. De Micheli, "New Logic Synthesis as Nanotechnology Enabler", *Proc. IEEE*, vol. 103, no. 11, pp. 2168–2195, 2015. [69] S. Agarwal, G. Burr, A. Chen, *et al.*, "International roadmap of devices and systems 2017 edition: Beyond cmos chapter.", Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2018. - [70] A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "Multifrequency data parallel spin wave logic gates", *IEEE Transactions on Magnetics*, vol. 57, no. 5, pp. 1–12, 2021. - [71] D. D. Stancil and A. Prabhakar, Spin Waves. New York: Springer, 2009. - [72] F. Bloch, "Zur theorie des ferromagnetismus", Z. Phys., vol. 61, no. 3, p. 206, 1930. - [73] C. Herring and C. Kittel, "On the Theory of Spin Waves in Ferromagnetic Media", *Phys. Rev.*, vol. 81, no. 5, p. 869, 1951. - [74] A. Mahmoud, F. Ciubotaru, F. Vanderveken, *et al.*, "Introduction to spin wave computing", *Journal of Applied Physics*, vol. 128, no. 16, p. 161 101, 2020. eprint: https://doi.org/10.1063/5.0019328. - [75] V. V. Kruglyak, S. O. Demokritov, and D. Grundler, "Magnonics", J. Phys. D: Appl. Phys., vol. 43, no. 26, p. 264 001, 2010. - [76] A. V. Chumak, A. A. Serga, and B. Hillebrands, "Magnonic crystals for data processing", *Journal of Physics D: Applied Physics*, vol. 50, no. 24, p. 244 001, 2017. - [77] A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "Fan-out enabled spin wave majority gate", *AIP Adv.*, vol. 10, no. 3, p. 035 119, 2020. - [78] A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana, and S. Hamdioui, "2-output spin wave programmable logic gate", in *2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*, 2020, pp. 60–65. - [79] A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana, and S. Hamdioui, "N-bit data parallel spin wave logic gate", in *2020 Design, Automation Test in Europe Conference Exhibition (DATE)*, 2020, pp. 642–645. - [80] A. Khitun, "Multi-frequency magnonic logic circuits for parallel data processing", *Journal of Applied Physics*, vol. 111, no. 5, p. 054 307, 2012. eprint: https://doi.org/10.1063/1.3689011. - [81] T. Schneider, A. A. Serga, B. Leven, B. Hillebrands, R. L. Stamps, and M. P. Kostylev, "Realization of spin-wave logic gates", *Appl. Phys. Lett.*, vol. 92, no. 2, p. 022505, 2008. - [82] I. A. Ustinova, A. A. Nikitin, A. B. Ustinov, B. A. Kalinikos, and E. Lähderanta, "Logic gates based on multiferroic microwave interferometers", in 2017 11th International Workshop on the Electromagnetic Compatibility of Integrated Circuits (EMCCompo), Jul. 2017, pp. 104–107. - [83] A. Khitun and K. L. Wang, "Nano scale computational architectures with spin wave bus", *Superlatt. Microstruct.*, vol. 38, no. 3, pp. 184–200, 2005. - [84] Y. Wu, M. Bao, A. Khitun, J.-Y. Kim, A. Hong, and K. L. Wang, "A three-terminal spin-wave device for logic applications", *Journal of Nanoelectronics and Optoelectronics*, vol. 4, no. 3, pp. 394–397, Dec. 2009. [85] A. Khitun, D. E. Nikonov, M. Bao, K. Galatsis, and K. L. Wang, "Feasibility study of logic circuits with a spin wave bus", *Nanotechnol.*, vol. 18, no. 46, p. 465 202, 2007. - [86] A. Khitun, M. Bao, Y. Wu, *et al.*, "Spin wave logic circuit on silicon platform", in *Fifth International Conference on Information Technology: New Generations (itng 2008)*, Apr. 2008, pp. 1107–1110. - [87] B. Rana and Y. Otani, "Voltage-controlled reconfigurable spin-wave nanochannels and logic devices", *Phys. Rev. Applied*, vol. 9, p. 014 033, 1 Jan. 2018. - [88] A. Chumak, A. Serga, and B. Hillebrands, "Magnon transistor for all-magnon data processing", *Nature Communications*, vol. 5, 2014. - [89] S. Klingler, P. Pirro, T. Brächer, B. Leven, B. Hillebrands, and A. V. Chumak, "Design of a spin-wave majority gate employing mode selection", *Appl. Phys. Lett.*, vol. 105, no. 15, p. 152 410, 2014. - [90] S. Klingler, P. Pirro, T. Brächer, B. Leven, B. Hillebrands, and A. V. Chumak, "Spin-wave logic devices based on isotropic forward volume magnetostatic waves", Appl. Phys. Lett., vol. 106, no. 21, p. 212 406, 2015. - [91] O. Zografos, S. Dutta, M. Manfrini, *et al.*, "Non-volatile spin wave majority gate at the nanoscale", *AIP Advances*, vol. 7, no. 5, p. 056 020, 2017. - [92] K. Nanayakkara, A. Anferov, A. P. Jacob, S. J. Allen, and A. Kozhanov, "Cross junction spin wave logic architecture", *IEEE Trans. Magn.*, vol. 50, no. 11, p. 3 402 204, 2014. - [93] T. Fischer, M. Kewenig, D. A. Bozhko, *et al.*, "Experimental prototype of a spinwave majority gate", *Appl. Phys. Lett.*, vol. 110, no. 15, p. 152 401, 2017. - [94] P. Shabadi, A. Khitun, P. Narayanan, *et al.*, "Towards logic functions as the device", in *2010 IEEE/ACM Intern. Symp. Nano. Arch.*, 2010, pp. 11–16. - [95] T. Fischer, M. Kewenig, D. A. Bozhko, *et al.*, "Experimental prototype of a spin-wave majority gate", *Applied Physics Letters*, vol. 110, no. 15, p. 152401, 2017. eprint: https://doi.org/10.1063/1.4979840. - [96] F. Ciubotaru, G. Talmelli, T. Devolder, *et al.*, "First experimental demonstration of a scalable linear majority gate based on spin waves", in *2018 IEEE International Electron Devices Meeting (IEDM)*, Dec. 2018, pp. 36.1.1–36.1.4. - [97] A. Khitun, "Magnonic holographic devices for special type data processing", *Journal of Applied Physics*, vol. 113, no. 16, p. 164 503, 2013. eprint: https://doi.org/10.1063/1.4802656. - [98] F. Gertz, A. Kozhevnikov, Y. Filimonov, and A. Khitun, "Magnonic holographic memory", *IEEE Transactions on Magnetics*, vol. 51, no. 4, pp. 1–5, Apr. 2015. - [99] A. Kozhevnikov, F. Gertz, G. Dudko, Y. Filimonov, and A. Khitun, "Pattern recognition with magnonic holographic memory device", *Applied Physics Letters*, vol. 106, no. 14, p. 142 409, 2015. eprint: https://doi.org/10.1063/1.4917507. [100] F. Gertz, A. Kozhevnikov, Y. Filimonov, D. Nikonov, and A. Khitun, "Magnonic holographic memory: From proposal to device", *IEEE Journal on Exploratory Solid-State Computational Devices and Circuits*, vol. 1, pp. 67–75, Dec. 2015. - [101] F. Gertz, A. V. Kozhevnikov, Y. Filimonov, and A. Khitun, "Magnonic holographic read-only memory", *IEEE Magnetics Letters*, vol. 7, pp. 1–4, 2016. - [102] S. Khasanvis, M. Rahman, S. N. Rajapandian, and C. A. Moritz, "Wave-based multi-valued computation framework", in 2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), Jul. 2014, pp. 171–176. - [103] G. Csaba, A. Papp, and W. Porod, "Spin-wave based realization of optical computing primitives", *Journal of Applied Physics*, vol. 115, no. 17, p. 17C741, 2014. eprint: https://doi.org/10.1063/1.4868921. - [104] P. Shabadi, S. N. Rajapandian, S. Khasanvis, and C. A. Moritz, "Design of spin wave functions-based logic circuits", *SPIN*, vol. 2, no. 3, p. 1240 006, 2012. - [105] K. Vogt, F. Y. Fradin, J. E. Pearson, *et al.*, "Realization of a spin-wave multiplexer", *Nature Commun.*, vol. 5, p. 3727, 2014. - [106] M. Balynsky, D. Gutierrez, H. Chiang, *et al.*, "Parallel data processing with magnonic holographic co-processor", in *2016 IEEE International Conference on Rebooting Computing (ICRC)*, Oct. 2016, pp. 1–4. - [107] A. Khitun, "Magnonic holographic co-processor: An approach to energy-efficient complementary logic circuitry", in 2015 Fourth Berkeley Symposium on Energy Efficient Electronic Systems (E3S), Oct. 2015, pp. 1–3. - [108] F. Gertz, A. Kozhevnikov, Y. Khivintsev, *et al.*, "Parallel read-out and database search with magnonic holographic memory", *IEEE Trans. Magn.*, vol. 52, no. 7, p. 3 401 304, 2016. - [109] O. Zografos, A. D. Meester, E. Testa, *et al.*, "Wave pipelining for majority-based beyond-cmos technologies", in *Design, Automation Test in Europe Conference Exhibition (DATE)*, 2017, Mar. 2017, pp. 1306–1311. - [110] T. Brächer and P. Pirro, "An analog magnon adder for all-magnonic neurons", *Journal of Applied Physics*, vol. 124, no. 15, p. 152119, 2018. eprint: https://doi.org/10.1063/1.5042417. - [111] M. Balynskiy, H. Chiang, D. Gutierrez, A. Kozhevnikov, Y. Filimonov, and A. Khitun, "Reversible magnetic logic gates based on spin wave interference", *J. Appl. Phys.*, vol. 123, no. 14, p. 144501, 2018. - [112] A. Khitun, "Parallel database search and prime factorization with magnonic holographic memory devices", *J. Appl. Phys.*, vol. 118, no. 24, p. 243 905, 2015. - [113] M. Rahman, S. Khasanvis, J. Shi, and C. A. Moritz, "Wave interference functions for neuromorphic computing", *IEEE Transactions on Nanotechnology*, vol. 14, no. 4, pp. 742–750, Jul. 2015. - [114] W. Burleson, M. Ciesielski, F. Klass, and W. Liu, "Wave-pipelining: A tutorial and research survey", *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 6, no. 3, pp. 464–474, 1998. [115] L. W. Cotten, "Maximum-rate pipeline systems", in *Proceedings of the May 14-16, 1969, Spring Joint Computer Conference*, ser. AFIPS '69 (Spring), Boston, Massachusetts: Association for Computing Machinery, 1969, pp. 581–586. - [116] S. Mittal, "A survey of techniques for approximate computing", *ACM Comput. Surv.*, vol. 48, no. 4, Mar. 2016. - [117] H. Aghasi, R. M. Iraei, A. Naeemi, and E. Afshari, "Smart detector cell: A scalable all-spin circuit for low power non-boolean pattern recognition", *IEEE Transactions on Nanotechnology*, vol. 15, no. 3, pp. 356–366, 2016. - [118] M. Niemier, G. Csaba, A. Dingler, *et al.*, "Boolean and non-boolean nearest neighbor architectures for out-of-plane nanomagnet logic", in *2012 13th International Workshop on Cellular Nanoscale Networks and their Applications*, 2012, pp. 1–6. - [119] A. R. Trivedi, R. Pandey, H. Liu, S. Datta, and S. Mukhopadhyay, "Gate/source overlapped heterojunction tunnel fet for non-boolean associative processing with plasticity", in *2015 IEEE International Electron Devices Meeting (IEDM)*, 2015, pp. 17.8.1–17.8.4. - [120] S. Bandyopadhyay, "Straintronics: Digital and analog electronics with strain-switched nanomagnets", *IEEE Open Journal of Nanotechnology*, vol. 1, pp. 57–64, 2020. - [121] S. Chakradhar, S. Rotherweiler, and V. Agrawal, "Redundancy removal and test generation for circuits with non-boolean primitives", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 16, no. 11, pp. 1370–1377, 1997. - [122] H. Danan, A. Herr, and A. J. P. Meyer, "New determinations of the saturation magnetization of nickel and iron", *J. Appl. Phys.*, vol. 39, p. 669, 1968. - [123] Y. K. Kim and M. Oliveria, "Magnetic properties of sputtered fe thin films: Processing and thickness dependence", *J. Appl. Phys.*, vol. 74, p. 1233, 1993. - [124] M. Kin, H. Kura, M. Tanaka, Y. Hayashi, J. Hasaegawa, and T. Ogawa, "Improvement of saturation magnetization of fe nanoparticles by post-annealing in a hydrogen gas atmosphere", *Journal of Applied Physics*, vol. 117, no. 17, 17E714, 2015. eprint: https://doi.org/10.1063/1.4919050. - [125] P. Vavassori, D. Bisero, F. Carace, *et al.*, "Interplay between magnetocrystalline and configurational anisotropies in fe(001) square nanostructures", *Phys. Rev. B*, vol. 72, p. 054 405, 2015. - [126] J. Bishop, "The shape, energy, eddy current loss, and relaxation damping of magnetic domain walls in glassy iron wire", *IEEE Trans. Magn.*, vol. 13, pp. 1638–1645, 1977. - [127] M. J. Aus, C. Cheung, B. Szpunar, U. Erb, and J. Szpunar, "Saturation magnetization of porosity-free nanocrystalline cobalt", *J. Mater. Sci.*, vol. 17, p. 1949, 1998. - [128] S. Vernon, S. Lindsay, and M. Stearns, "Brillouin scattering from thermal magnons in a thin co film", *Phys. Rev. B*, vol. 29, p. 4439, 1984. - [129] A. Michels, J. Weissmüller, A. Wiedenmann, J. S. Pedersen, and J. Barker, "Measuring the exchange-stiffness constant of nanocrystalline solids by elastic small-angle neutron scattering", *Phil. Mag.*, vol. 80, p. 785, 2000. [130] M. A. W. Schoen, D. Thonig, M. L. Schneider, *et al.*, "Ultra-low magnetic damping of a metallic ferromagnet", *Nature Phys.*, vol. 12, p. 839, 2016. - [131] B. Heinrich, J. Cochran, M. Kowalewski, *et al.*, "Magnetic anisotropies and exchange coupling in ultrathin fcc co(001) structures", *Phys. Rev. B*, vol. 44, pp. 9348–9361, 1991. - [132] L. Sun, Y. Hao, C.-L. Chien, and P. C. Searson, "Tuning the properties of magnetic nanowires", *IBM J. Res. Devel.*, vol. 49, pp. 79–102, 2005. - [133] P. Talagala, P. S. Fodor, D. Haddad, *et al.*, "Determination of magnetic exchange stiffness and surface anisotropy constants in epitaxial ni1-xcox100 films", *Phys. Rev. B*, vol. 66, p. 144 426, 2002. - [134] A. Michels, "Exchange-stiffness constant in cold-worked and nanocrystalline ni measured by elastic small-angle neutron scattering", *J. Appl. Phys.*, vol. 87, p. 5953, 2000. - [135] J. Walowski, M. D. Kaufmann, B. Lenk, C. Hamann, J. McCord, and M. Münzenberg, "Intrinsic and non-local gilbert damping in polycrystalline nickel studied by ti:sapphire laser fs spectroscopy", *J. Phys. D: Appl. Phys.*, vol. 41, p. 16, 2008. - [136] A. A. Serga, A. V. Chumak, and B. Hillebrands, "Yig magnonics", *J. Phys. D: Appl. Phys.*, vol. 43, p. 264 002, 2010. - [137] V. Cherepanov, I. Kolokolov, and V. L'vov, "The saga of yig: Spectra, thermodynamics, interaction and relaxation of magnons in a complex magnet", *Phys. Rep.*, vol. 229, p. 81, 1993. - [138] H. L. Glass, "Ferrite films for microwave and millimeter-wave devices", *Proc. IEEE*, vol. 76, p. 151, 1988. - [139] S. Geller and M. A. Gilleo, "Structure and ferrimagnetism of yttrium and rare-earth-iron garnets", *Acta Crystallogr.*, vol. 10, p. 239, 1957. - [140] S. Klingler, A. Chumak, T. Mewes, *et al.*, "Measurements of the exchange stiffness of yig films using broadband ferromagnetic resonance techniques", *J. Phys. D: Appl. Phys.*, vol. 48, p. 015 001, 2015. - [141] C. R. Serrao, J. R. Sahu, K. Ramesha, and C. N. R. Rao, "Magnetoelectric effect in rare earth ferrites, Infe2o4", *Journal of Applied Physics*, vol. 104, no. 1, p. 016102, 2008. eprint: https://doi.org/10.1063/1.2946455. - [142] P. Pirro, T. Brächer, A. V. Chumak, *et al.*, "Spin-wave excitation and propagation in microstructured waveguides of yttrium iron garnet/pt bilayers", *Appl. Phys. Lett.*, vol. 104, p. 012 402, 2014. - [143] C. Hahn, V. V. Naletov, G. de Loubens, *et al.*, "Measurement of the intrinsic damping constant in individual nanodisks of y3fe5o12 and y3fe5o12pt", *Appl. Phys. Lett.*, vol. 104, p. 152 410, 2014. - [144] C. Dubs, O. Surzhenko, R. Linke, A. Danilewsky, U. Brückner, and J. Dellith, "Submicrometer yttrium iron garnet lpe films with low ferromagnetic resonance losses", *J. Phys. D: Appl. Phys.*, vol. 50, p. 204 005, 2017. [145] Y. Sun, Y. Song, H. Chang, *et al.*, "Growth and ferromagnetic resonance properties of nanometer-thick yttrium iron garnet films", *Appl. Phys. Lett.*, vol. 101, p. 152 405, 2012. - [146] H. Yu, O. d'Allivy Kelly, V. Cros, *et al.*, "Magnetic thin-film insulator with ultra-low spin wave damping for coherent nanomagnonics", *Sci. Rep.*, vol. 4, p. 6848, 2014. - [147] M. C. Onbasli, A. Kehlberger, D. H. Kim, *et al.*, "Pulsed laser deposition of epitaxial yttrium iron garnet films with low gilbert damping and bulk-like magnetization", *APL Mater.*, vol. 2, p. 106 102, 2014. - [148] T. Liu, H. Chang, V. Vlaminck, *et al.*, "Ferromagnetic resonance of sputtered yttrium iron garnet nanometer films", *J. Appl. Phys.*, vol. 115, 17A501, 2014. - [149] V. E. Demidov and S. O. Demokritov, "Magnonic waveguides studied by microfocus brillouin light scattering", *IEEE Trans. Magn.*, vol. 51, p. 0800215, 2015. - [150] S. S. Kalarickal, P. Krivosik, M. Wu, *et al.*, "Ferromagnetic resonance linewidth in metallic thin films: Comparison of measurement methods", *J. Appl. Phys.*, vol. 99, p. 093 909, 2006. - [151] T. Sebastian, K. Schultheiss, B. Obry, B. Hillebrands, and H. Schultheiss, "Microfocused brillouin light scattering: Imaging spin waves at the nanoscale (review paper)", *Front. Phys.*, vol. 3, p. 35, 2015. - [152] C. E. Patton, "Linewidth and relaxation processes for the main resonance in the spin-wave spectra of 59 ni–fe alloy films", *J. Appl. Phys.*, vol. 39, p. 3060, 1968. - [153] A. Brunsch, "Magnetic properties and corrosion resistance of (CoFeB)100-xCrx thin films", *J. Appl. Phys.*, vol. 50, p. 7603, 1979. - [154] X. Liu, W. Zhang, M. J. Carter, and G. Xiao, "Ferromagnetic resonance and damping properties of cofeb thin films as free layers in mgo-based magnetic tunnel junctions", *J. Appl. Phys.*, vol. 110, p. 033 910, 2011. - [155] A. Conca, J. Greser, T. Sebastian, *et al.*, "Low spinwave damping in amorphous co40fe40b20 thin films", *J. Appl. Phys.*, vol. 113, p. 213 909, 2013. - [156] C. Liu, C. Mewes, M. Chshiev, T. Mewes, and W. Butler, "Origin of low gilbert damping in half metal", *Appl. Phys. Lett.*, vol. 95, p. 022 509, 2009. - [157] S. Trudel, O. Gaier, J. Hamrle, and B. Hillebrands, "Magnetic anisotropy, exchange and damping in cobalt-based full-heusler compounds: An experimental review", *J. Phys. D: Appl. Phys.*, vol. 43, p. 193 001, 2010. - [158] M. Oogane, T. Kubota, Y. Kota, *et al.*, "Gilbert magnetic damping constant of epitaxially grown co-based heusler alloy thin films", *Appl. Phys. Lett.*, vol. 96, p. 252 501, 2010. - [159] T. Sebastian, Y. Ohdaira, T. Kubota, *et al.*, "Low-damping spin-wave propagation in a micro-structured co2mn0.6fe0.4si heusler waveguide", *Appl. Phys. Lett.*, vol. 100, p. 112 402, 2012. - [160] F. Brailsford, *Physical Principles of Magnetism*. London, New York: Van Nostrand, 1966. [161] A. G. Gurevich and G. . Melkov, *Magnetization Oscillations and Waves*. Boca Raton: CRC Press, 1996. - [162] S. Chikazumi, *Physics of Ferromagnetism*. Oxford, New York: Oxford University Press, 1997. - [163] L. Landau and E. Lifshitz., "On the theory of the dispersion of magnetic permeability in ferromagnetic bodies", *Phys. Z. Sowjet.*, vol. 8, pp. 101–114, 1935. - [164] T. L. Gilbert, "A phenomenological theory of damping in ferromagnetic materials", *IEEE Trans. Magn.*, vol. 40, no. 6, pp. 3443–3449, 2004. - [165] B. A. Kalinikos and A. N. Slavin, "Theory of dipole-exchange spin wave spectrum for ferromagnetic films with mixed exchange boundary conditions", *J. Phys. C: Solid State Phys.*, vol. 19, no. 35, p. 7013, 1986. - [166] K. Y. Guslienko and A. N. Slavin, "Boundary conditions for magnetization in magnetic nano-elements", *Phys. Rev. B*, vol. 72, p. 014463, 2005. - [167] Q. Wang, B. Heinz, R. Verba, *et al.*, "Spin pinning and spin-wave dispersion in nanoscopic ferromagnetic waveguides", *Phys. Rev. Lett.*, vol. 122, p. 247 202, 2019. - [168] J. R. Eshbach and R. W. Damon, "Surface magnetostatic modes and surface spin waves", *Phys. Rev.*, vol. 118, pp. 1208–1210, 5 1960. - [169] M. G. Cottam, *Linear and Nonlinear Spin Waves in Magnetic Films and Superlattices.* Singapore: World Scientific, 1994. - [170] P. E. Wigen, Nonlinear Phenomena and Chaos in Magnetic Materials. WORLD SCIENTIFIC, 1994. eprint: https://www.worldscientific.com/doi/pdf/10.1142/1686. - [171] S. O. Demokritov and A. N. Slavin, Eds., *Magnonics: From Fundamentals to Applications*. Berlin, Heidelberg: Springer, 2013. - [172] H. Suhl, "The Nonlinear Behavior of Ferrites at High Microwave Signal Levels", *Proc. IRE*, vol. 44, no. 10, pp. 1270–1284, 1956. - [173] H. Suhl, "The theory of ferromagnetic resonance at high signal powers", *J. Phys. Chem. Solids*, vol. 1, no. 4, pp. 209–227, 1957. - [174] V. Zakharov, V. L'vov, and S. Starobinets, "Spin-wave turbulence beyond the parametric excitation threshold", *Sov. Phys.-Usp.*, vol. 17, no. 4, p. 896, 1975. - [175] V. Lvov, Wave Turbulence Under Parametric Excitation. Berlin, Heidelberg: Springer, 1994. - [176] P. Krivosik and C. E. Patton, "Hamiltonian formulation of nonlinear spin-wave dynamics: Theory and applications", *Phys. Rev. B*, vol. 82, p. 184 428, 2010. - [177] A. A. Serga, A. V. Chumak, A. André, *et al.*, "Parametrically Stimulated Recovery of a Microwave Signal Stored in Standing Spin-Wave Modes of a Magnetic Film", *Phys. Rev. Lett.*, vol. 99, no. 22, p. 227 202, 2007. - [178] Q. Wang, M. Kewenig, M. Schneider, *et al.*, "Realization of a nanoscale magnonic directional coupler for all-magnon circuits", *Nature Electron.*, 2020, in print. [179] P. Pirro, T. Sebastian, T. Brächer, *et al.*, "Non-Gilbert-damping Mechanism in a Ferromagnetic Heusler Compound Probed by Nonlinear Spin Dynamics", *Phys. Rev. Lett.*, vol. 113, no. 22, p. 227 601, 2014. - [180] T. Brächer, P. Pirro, and B. Hillebrands, "Parallel pumping for magnon spintronics: Amplification and manipulation of magnon spin currents on the micronscale", *Phys. Rep.*, vol. 699, p. 1, 2017. - [181] C. L. Ordóñez-Romero, B. A. Kalinikos, P. Krivosik, W. Tong, P. Kabos, and C. E. Patton, "Three-magnon splitting and confluence processes for spin-wave excitations in yttrium iron garnet films: Wave vector selective Brillouin light scattering measurements and analysis", *Phys. Rev. B*, vol. 79, no. 14, p. 144 428, 2009. - [182] B. Heinz, T. Brächer, M. Schneider, *et al.*, "Propagation of Spin-Wave Packets in Individual Nanosized Yttrium Iron Garnet Magnonic Conduits", *Nano Lett.*, vol. 20, no. 6, pp. 4220–4227, 2020. - [183] M. B. Jungfleisch, A. V. Chumak, A. Kehlberger, *et al.*, "Thickness and power dependence of the spin-pumping effect in y3fe5o12 /Pt heterostructures measured by the inverse spin Hall effect", *Phys. Rev. B*, vol. 91, no. 13, p. 134 407, 2015. - [184] M. Mohseni, Q. Wang, B. Heinz, *et al.*, "Controlling of nonlinear relaxation of quantized magnons in nano-devices", *arXiv:2006.03400*, 2020. - [185] D. A. Patterson and J. L. Hennessy, *Computer Organization and Design: The Hardware/Software Interface*, 4<sup>th</sup>. Waltham: Morgan Kaufmann, 2011. - [186] A. Khitun, M. Bao, and K. L. Wang, "Magnetic cellular nonlinear network with spin wave bus for image processing", *Superlatt. Microstruct.*, vol. 47, no. 3, p. 464, 2010. - [187] F. Macià, A. D. Kent, and F. C. Hoppensteadt, "Spin-wave interference patterns created by spin-torque nano-oscillators for memory and computation", *Nanotechnol.*, vol. 22, no. 9, p. 095 301, 2011. - [188] H. Arai and H. Imamura, "Neural-network computation using spin wave coupled spin torque oscillators", *Phys. Rev. Appl.*, vol. 10, no. 2, p. 024 040, 2018. - [189] R. Nakane, G. Tanaka, and A. Hirose, "Reservoir computing with spin waves excited in a garnet film", *IEEE Access*, vol. 6, pp. 4462–4469, 2018. - [190] G. Tanaka, T. Yamane, J. B. Héroux, *et al.*, "Recent advances in physical reservoir computing: A review", *Neural Networks*, vol. 115, p. 100, 2019. - [191] S. Watt and M. Kostylev, "Reservoir computing using a spin wave delay line active ring resonator based on yttrium iron garnet film", *Phys. Rev. Appl.*, vol. 13, no. 3, p. 034 057, 2020. - [192] M. Collet, O. Gladii, M. Evelt, *et al.*, "Spin-wave propagation in ultra-thin YIG based waveguides", *Appl. Phys. Lett.*, vol. 110, no. 9, p. 092 408, 2017. - [193] M. Born and E. Wolf, *Principles of Optics*, 7<sup>th</sup>. Cambridge, New York: Cambridge University Press, 1999. - [194] M. J. Donahue and D. G. Porter, "Oommf user's guide, version 1.0", *Interagency Report NISTIR 6376*, Sep. 1999. [195] A. Vansteenkiste, J. Leliaert, M. Dvornik, M. Helsen, F. Garcia-Sanchez, and B. Van Waeyenberge, "The design and verification of mumax3", *AIP Advances*, vol. 4, no. 10, p. 107 133, 2014. eprint: https://doi.org/10.1063/1.4899186. - [196] A. Khitun, M. Bao, and k L Wang, "Spin wave magnetic nanofabric: A new approach to spin-based logic circuitry", *IEEE Transactions on Magnetics*, vol. 44, no. 9, pp. 2141–2152, Sep. 2008. - [197] T. Brächer, F. Heussner, P. Pirro, *et al.*, "Phase-to-intensity conversion of magnonic spin currents and application to the design of a majority gate", *Sci. Rep.*, vol. 6, p. 38 235, 2016. - [198] K. Ganzhorn, S. Klingler, T. Wimmer, *et al.*, "Magnon-based logic in a multi-terminal YIG/Pt nanostructure", *Appl. Phys. Lett.*, vol. 109, no. 2, p. 022 405, 2016. - [199] L. Amaru, P.-E. Gaillardon, and G. De Micheli, "Boolean logic optimization in majority-inverter graphs", in 2015 52nd ACM/EDAC/IEEE Design Automation Conf. (DAC), IEEE, 2015, pp. 1–6. - [200] L. Amaru, P. Gaillardon, and G. Micheli, "Majority inverter graph: A new paradigm for logic optimization", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 35, no. 5, pp. 806–819, May 2016. - [201] O. Zografos, P. Raghavan, L. Amarù, *et al.*, "System-level assessment and area evaluation of spin wave logic circuits", in *2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)*, IEEE, 2014, pp. 25–30. - [202] I. P. Radu, O. Zografos, A. Vaysset, *et al.*, "Overview of spin-based majority gates and interconnect implications", in *2016 IEEE International Interconnect Technology Conference / Advanced Metallization Conference (IITC/AMC)*, 2016, pp. 51–52. - [203] R. Lucas, M. Fossorier, Y. Kou, and S. Lin, "Iterative decoding of one step majority logic deductible codes based on belief propagation", *IEEE Trans. Commun.*, vol. 48, no. 6, pp. 931–937, 2000. - [204] R. Palanki, M. Fossorier, and J. S. Yedidia, "Iterative decoding of multiple step majority logic decodable codes", *IEEE Trans. Commun.*, vol. 55, no. 6, pp. 1099–1102, 2007. - [205] H. Wei, Z. Wang, X. Tian, M. Käll, and H. Xu, "Cascaded logic gates in nanophotonic plasmon networks", *Nature Commun.*, vol. 2, p. 387, 2011. - [206] Y. Fu, X. Hu, C. Lu, S. Yue, H. Yang, and Q. Gong, "All-Optical Logic Gates Based on Nanoscale Plasmonic Slot Waveguides", *Nano Lett.*, vol. 12, no. 11, p. 5784, 2012. - [207] S. Lal, J. H. Hafner, N. J. Halas, S. Link, and P. Nordlander, "Noble Metal Nanowires: From Plasmon Waveguides to Passive and Active Devices", *Acc. Chem. Res.*, vol. 45, no. 11, p. 1887, 2012. - [208] S. Dutta, O. Zografos, S. Gurunarayanan, *et al.*, "Proposal for nanoscale cascaded plasmonic majority gates for non-Boolean computation", *Sci. Rep.*, vol. 7, p. 17866, 2017. [209] M. Maldovan, "Sound and heat revolutions in phononics", *Nature*, vol. 503, p. 209, 2013. - [210] S. R. Sklan, "Splash, pop, sizzle: Information processing with phononic computing", *AIP Adv.*, vol. 5, no. 5, p. 053 302, 2015. - [211] A. Khitun, D. E. Nikonov, M. Bao, K. Galatsis, and K. L. Wang, "Efficiency of Spin-Wave Bus for Information Transmission", *IEEE Trans. Electron Devices*, vol. 54, no. 12, p. 3418, 2007. - [212] S. Rakheja, A. Ceyhan, and A. Naeemi, "Interconnect consideration", in *CMOS and Beyond: Logic Switches for Terascale Integrated Circuits*, T.-J. K. Liu and K. Kuhn, Eds., Cambridge: Cambridge University Press, 2015, ch. 15, p. 381. - [213] S. Dutta, S.-C. Chang, N. Kani, *et al.*, "Non-volatile Clocked Spin Wave Interconnect for Beyond-CMOS Nanomagnet Pipelines", *Sci. Rep.*, vol. 5, p. 9861, 2015. - [214] F. Gertz, A. V. Kozhevnikov, Y. A. Filimonov, D. E. Nikonov, and A. Khitun, "Magnonic Holographic Memory: From Proposal to Device", *IEEE J. Explor. Solid-State Computat.*, vol. 1, pp. 67–75, 2015. - [215] A. K. Sharma, *Advanced Semiconductor Memories: Architectures, Designs, and Applications.* Piscataway, Hoboken: Wiley–IEEE Press, 2002. - [216] S. Hong, O. Auciello, and D. Wouters, Eds., *Emerging Non-Volatile Memories*. New York: Springer, 2014. - [217] Y. Zhang, T. Yu, J.-l. Chen, *et al.*, "Antenna design for propagating spin wave spectroscopy in ferromagnetic thin films", *Journal of Magnetism and Magnetic Materials*, vol. 450, pp. 24–28, 2018, Perspectives on magnon spintronics. - [218] J. Slonczewski, "Current-driven excitation of magnetic multilayers", *Journal of Magnetism and Magnetic Materials*, vol. 159, no. 1, pp. L1–L7, 1996. - [219] L. Berger, "Emission of spin waves by a magnetic multilayer traversed by a current", *Phys. Rev. B*, vol. 54, pp. 9353–9358, 13 Oct. 1996. - [220] M. Tsoi, A. G. M. Jansen, J. Bass, *et al.*, "Excitation of a magnetic multilayer by an electric current", *Phys. Rev. Lett.*, vol. 80, pp. 4281–4284, 19 May 1998. - [221] J. E. Hirsch, "Spin hall effect", Phys. Rev. Lett., vol. 83, pp. 1834–1837, 9 Aug. 1999. - [222] I. Radu, "Spin logic options for beyond or along cmos", *IEEE Semiconductor Interface Specialists Conference (SISC)*, 2015. - [223] S. Cherepov, P. Khalili Amiri, J. G. Alzate, *et al.*, "Electric field induced spin wave generation using multiferroic magnetoelectric cells", *App. Physics Letters*, vol. 104, no. 8, p. 082 403, 2014. eprint: https://doi.org/10.1063/1.4865916. - [224] R. Verba, M. Carpentieri, G. Finocchio, V. Tiberkevich, and A. Slavin, "Excitation of spin waves in an in-plane-magnetized ferromagnetic nanowire using voltage-controlled magnetic anisotropy", *Phys. Rev. Applied*, vol. 7, p. 064 023, 6 Jun. 2017. - [225] C. Felser and A. Hirohata, *Heusler Alloys: Properties, Growth, Applications*. Springer, 2015. [226] C. J. Palmstrøm, "Heusler compounds and spintronics", *Prog. Crystal Growth Charact. Mater.*, vol. 62, no. 2, pp. 371–397, 2016. - [227] L. Wollmann, A. K. Nayak, S. S. Parkin, and C. Felser, "Heusler 4.0: Tunable Materials", *Ann. Rev. Mater. Res.*, vol. 47, no. 1, pp. 247–270, 2017. - [228] M. Balinskiy, A. C. Chavez, A. Barra, H. Chiang, G. P. Carman, and A. Khitun, "Magnetoelectric spin wave modulator based on synthetic multiferroic structure", in *Scientific Reports*, 2018. - [229] M. Balynsky, A. Kozhevnikov, Y. Khivintsev, *et al.*, "Magnonic interferometric switch for multi-valued logic circuits", *J. Appl. Phys.*, vol. 121, no. 2, p. 024 504, 2017. - [230] A. Khitun, D. E. Nikonov, and K. L. Wang, "Magnetoelectric spin wave amplifier for spin wave logic circuits", *J. Appl. Phys.*, vol. 106, no. 12, p. 123 909, 2009. - [231] M. Bao, A. Khitun, J. Lee, A. P. Jacob, and K. L. Wang, "A magnetic amplifier for amplifying spin-wave signal", in *2009 Device Research Conference*, Jun. 2009, pp. 45–46. - [232] P. Chowdhury, P. Dhagat, and A. Jander, "Parametric amplification of spin waves using acoustic waves", *IEEE Trans. Magn.*, vol. 51, no. 11, p. 1300 904, 2015. - [233] M. Bao, K. Wong, A. Khitun, and K. L. Wang, "Nonreciprocal amplification of spinwave signals", in *68th Device Research Conference*, Jun. 2010, pp. 89–90. - [234] Z. Haghshenasfard and M. G. Cottam, "Parallel pumping of spin waves for ferromagnetic nanowires and nanotubes with circular cross sections", *IEEE Magn. Lett.*, vol. 7, pp. 1–5, 2016. - [235] T. Brächer, F. Heussner, P. Pirro, *et al.*, "Time- and power-dependent operation of a parametric spin-wave amplifier", *Appl. Phys. Lett.*, vol. 105, no. 23, p. 232 409, 2014. - [236] T. Meyer, T. Brächer, F. Heussner, *et al.*, "Realization of a spin-wave switch based on the spin-transfer-torque effect", *IEEE Magn. Lett.*, vol. 9, pp. 1–5, 2018. - [237] R. Verba, M. Carpentieri, G. Finocchio, V. Tiberkevich, and A. Slavin, "Amplification and stabilization of large-amplitude propagating spin waves by parametric pumping", *Appl. Phys. Lett.*, vol. 112, no. 4, p. 042 402, 2018. - [238] D. S. Deng, X. F. Jin, and R. Tao, "Magnon energy gap in a periodic anisotropic magnetic superlattice", *Phys. Rev. B*, vol. 66, p. 104 435, 10 Sep. 2002. - [239] M. P. Kostylev, A. A. Stashkevich, and N. A. Sergeeva, "Collective magnetostatic modes on a one-dimensional array of ferromagnetic stripes", *Phys. Rev. B*, vol. 69, p. 064 408, 6 Feb. 2004. - [240] G. Gubbiotti, S. Tacchi, M. Madami, *et al.*, "Collective spin waves in a bicomponent two-dimensional magnonic crystal", *Appl. Phys. Lett.*, vol. 100, no. 16, p. 162 407, 2012. - [241] D. Kumar, J. Klos, M. Krawczyk, and A. Barman, "Magnonic band structure, complete bandgap, and collective spin wave excitation in nanoscale two dimensional magnonic crystals", *J. Appl. Phys.*, vol. 115, no. 4, p. 043 917, 2014. [242] M. Krawczyk and H. Puszkarski, "Plane wave theory of three dimensional magnonic crystals", *Phys. Rev. B*, vol. 77, no. 5, p. 054 437, 2008. - [243] J. Romero Vivas, S. Mamica, M. Krawczyk, and V. V. Kruglyak, "Investigation of spin wave damping in three-dimensional magnonic crystals using the plane wave method", *Phys. Rev. B*, vol. 86, no. 14, p. 144 417, 2012. - [244] G. Gubbiotti, A. Sadovnikov, E. Beginin, *et al.*, "Magnonic band structure in vertical meander-shaped cofeb thin films", *arXiv:2007.13707*, 2020. - [245] M. Krawczyk and D. Grundler, "Review and prospects of magnonic crystals and devices with reprogrammable band structure", *J. Phys.: Cond. Matter*, vol. 26, no. 12, p. 123 202, 2014. - [246] A. V. Chumak, "Fundamentals of magnon-based computing", *arXiv preprint arXiv* : 1901.08934, 2019. - [247] A. Hoffmann, "Spin hall effects in metals", *IEEE Transactions on Magnetics*, vol. 49, no. 10, pp. 5172–5193, Oct. 2013. - [248] R. Verba, G. Melkov, V. Tiberkevich, and A. Slavin, "Collective spin-wave excitations in a two-dimensional array of coupled magnetic nanodots", *Phys. Rev. B*, vol. 85, p. 014 427, 1 Jan. 2012. - [249] M. Beleggia, S. Tandon, Y. Zhu, and M. [Graef], "On the magnetostatic interactions between nanoparticles of arbitrary shape", *Journal of Magnetism and Magnetic Materials*, vol. 278, no. 1, pp. 270–284, 2004. - [250] Q. Wang, P. Pirro, R. Verba, A. Slavin, B. Hillebrands, and A. V. Chumak, "Reconfigurable nanoscale spin-wave directional coupler", *Science Advances*, vol. 4, no. 1, 2018. eprint: https://advances.sciencemag.org/content/4/1/e1701517.full.pdf. - [251] Q. Wang, R. Verba, T. Brächer, P. Pirro, and A. V. Chumak, "Integrated magnonic half-adder", *ArXiv*, vol. abs/1902.02855, 2019. - [252] A. Sadovnikov, E. Beginin, S. Sheshukova, D. Romanenko, Y. Sharaevskii, and S. Nikitov, "Directional multimode coupler for planar magnonics: Side-coupled magnetic stripes", *Applied Physics Letters*, vol. 107, no. 20, p. 202 405, 2015. eprint: https://doi.org/10.1063/1.4936207. - [253] H. G. Bauer, P. Majchrak, T. Kachel, C. H. Back, and G. Woltersdorf, "Nonlinear spin-wave excitations at low magnetic bias fields", *Nature Communications*, vol. 6, 2015. - [254] A. Sadovnikov, S. Odintsov, E. Beginin, S. Sheshukova, Y. Sharaevskii, and S. Nikitov, "Toward nonlinear magnonics: Intensity-dependent spin-wave switching in insulating side-coupled magnetic stripes", *Phys. Rev. B*, vol. 96, p. 144 428, 14 Oct. 2017. - [255] R. Verba, M. Carpentieri, G. Finocchio, V. Tiberkevich, and A. Slavin, "Excitation of propagating spin waves in ferromagnetic nanowires by microwave voltagecontrolled magnetic anisotropy", *Scientific Reports*, vol. 6, 2016. [256] P. Krivosik and C. E. Patton, "Hamiltonian formulation of nonlinear spin-wave dynamics: Theory and applications", *Phys. Rev. B*, vol. 82, p. 184 428, 18 Nov. 2010. - [257] H. G. Bauer, P. Majchrak, T. Kachel, C. H. Back, and G. Woltersdorf, "Nonlinear spin-wave excitations at low magnetic bias fields", *Nature Commun.*, vol. 6, p. 8274, 2015. - [258] B. A. Kalinikos and A. B. Ustinov, "Nonlinear spin waves in magnetic films and structures: Physics and devices", *Solid State Phys.*, vol. 64, p. 193, 2013. - [259] M. Balinskiy, H. Chiang, and A. Khitun, "Realization of spin wave switch for data processing", *AIP Advances*, vol. 8, no. 5, p. 056 628, 2018. eprint: https://doi.org/10.1063/1.5004992. - [260] Q. Wang, P. Pirro, R. Verba, A. Slavin, B. Hillebrands, and A. V. Chumak, "Reconfigurable nanoscale spin-wave directional coupler", *Sci. Adv.*, vol. 4, no. 1, e1701517, 2018. - [261] S. V. Vasiliev, V. V. Kruglyak, M. L. Sokolovskii, and A. N. Kuchko, "Spin wave interferometer employing a local nonuniformity of the effective magnetic field", *J. Appl. Phys.*, vol. 101, no. 11, p. 113 919, 2007. - [262] V. E. Demidov, S. Urazhdin, and S. O. Demokritov, "Control of spin-wave phase and wavelength by electric current on the microscopic scale", *Appl. Phys. Lett.*, vol. 95, no. 26, p. 262 509, 2009. - [263] T. Liu and G. Vignale, "Electric Control of Spin Currents and Spin-Wave Logic", *Phys. Rev. Lett.*, vol. 106, no. 24, p. 247 203, 2011. - [264] B. Rana and Y. Otani, "Towards magnonic devices based on voltage-controlled magnetic anisotropy", *Commun. Phys.*, vol. 2, no. 1, p. 90, 2019. - [265] N. Kanazawa, T. Goto, K. Sekiguchi, *et al.*, "The role of Snell's law for a magnonic majority gate", *Sci. Rep.*, vol. 7, no. 1, p. 7898, 2017. - [266] F. Ciubotaru, G. Talmelli, T. Devolder, *et al.*, "First experimental demonstration of a scalable linear majority gate based on spin waves", in *2018 IEEE International Electron Devices Meeting (IEDM)*, IEEE, 2018, pp. 36.1.1–36.1.4. - [267] N. Sato, K. Sekiguchi, and Y. Nozaki, "Electrical Demonstration of Spin-Wave Logic Operation", *Appl. Phys. Express*, vol. 6, no. 6, p. 063 001, 2013. - [268] G. Talmelli, T. Devolder, N. Träger, *et al.*, "Reconfigurable nanoscale spin wave majority gate with frequency-division multiplexing", *arXiv*:1908.02546, 2019. - [269] B. Divinskiy, V. E. Demidov, S. Urazhdin, R. Freeman, A. B. Rinkevich, and S. O. Demokritov, "Excitation and Amplification of Spin Waves by Spin–Orbit Torque", Adv. Mater., vol. 30, no. 33, p. 1802837, 2018. - [270] F. Ciubotaru, A. A. Serga, B. Leven, B. Hillebrands, and L. Lopez-Diaz, "Mechanisms of nonlinear spin-wave emission from a microwave-driven nanocontact", *Phys. Rev. B*, vol. 84, p. 144 424, 2011. - [271] A. V. Sadovnikov, C. S. Davies, S. V. Grishin, *et al.*, "Magnonic beam splitter: The building block of parallel magnonic circuitry", *Appl. Phys. Lett.*, vol. 106, no. 19, p. 192 406, 2015. [272] C. S. Davies, A. V. Sadovnikov, S. V. Grishin, Y. P. Sharaevsky, S. A. Nikitov, and V. V. Kruglyak, "Field-controlled phase-rectified magnonic multiplexer", *IEEE Trans. Magn.*, vol. 51, no. 11, p. 1, 2015. - [273] F. Heussner, M. Nabinger, T. Fischer, *et al.*, "Frequency-Division Multiplexing in Magnonic Logic Networks Based on Caustic-Like Spin-Wave Beams", *Phys. Status Solidi RRL*, vol. 12, no. 12, p. 1800 409, 2018. - [274] F. Heussner, G. Talmelli, M. Geilen, *et al.*, "Experimental Realization of a Passive Gigahertz Frequency-Division Demultiplexer for Magnonic Logic Networks", *Phys. Status Solidi RRL*, vol. 14, p. 1 900 695, 2020. - [275] T. Schneider, A. A. Serga, A. V. Chumak, *et al.*, "Nondiffractive Subwavelength Wave Beams in a Medium with Externally Controlled Anisotropy", *Phys. Rev. Lett.*, vol. 104, no. 19, p. 197 203, 2010. - [276] Y. Khivintsev, M. Ranjbar, D. Gutierrez, *et al.*, "Prime factorization using magnonic holographic devices", *Journal of Applied Physics*, vol. 120, no. 12, p. 123 901, 2016. eprint: https://doi.org/10.1063/1.4962740. - [277] D. Miller, "Device Requirements for Optical Interconnects to Silicon Chips", *Proc. IEEE*, vol. 97, no. 7, pp. 1166–1185, 2009. - [278] D. Gutierrez, H. Chiang, T. Bhowmick, *et al.*, "Magnonic holographic imaging of magnetic microstructures", *Journal of Magnetism and Magnetic Materials*, vol. 428, pp. 348–356, 2017. - [279] C. Pan and A. Naeemi, "An expanded benchmarking of beyond-cmos devices based on boolean and neuromorphic representative circuits", *IEEE J. Explor. Solid-State Computat.*, vol. 3, pp. 101–110, 2017. - [280] N. Locatelli, V. Cros, and J. Grollier, "Spin-torque building blocks", *Nature Materials*, vol. 13, 2014. - [281] G. Gubbiotti, Ed., *Three-Dimensional Magnonics: Layered, Micro- and Nanos-tructures*. Singapore: Jenny Stanford Publishing, 2019. - [282] S. Sangiao, C. Magén, D. Mofakhami, G. d. Loubens, and J. M. D. Teresa, "Magnetic properties of optimized cobalt nanospheres grown by focused electron beam induced deposition (FEBID) on cantilever tips", *Beilstein J. Nanotechnol.*, vol. 8, no. 1, pp. 2106–2115, 2017. - [283] O. V. Dobrovolskiy, S. A. Sachser Roland Bunyaev, D. Navas, *et al.*, "Spin-Wave Phase Inverter upon a Single Nanodefect", *ACS Appl. Mater. Interf.*, vol. 11, no. 19, pp. 17654–17662, 2019. - [284] M. Huth, F. Porrati, and O. V. Dobrovolskiy, "Focused electron beam induced deposition meets materials science", *Microelectron. Engin.*, vol. 185-186, pp. 9–28, 2018. - [285] P. Fischer, D. Sanz-Hernández, R. Streubel, and A. Fernández-Pacheco, "Launching a new dimension with 3D magnetic nanostructures", *APL Mater.*, vol. 8, no. 1, p. 010701, 2020. [286] L. Flajšman, K. Wagner, M. Vanatka, *et al.*, "Zero-field propagation of spin waves in waveguides prepared by focused ion beam direct writing", *Phys. Rev. B*, vol. 101, no. 1, p. 014 436, 2020. - [287] O. V. Dobrovolskiy, M. Kompaniiets, F. Sachser Roland Porrati, C. Gspan, H. Plank, and M. Huth, "Tunable magnetism on the lateral mesoscale by post-processing of Co/Pt heterostructures", *Beilstein J. Nanotechnol.*, vol. 6, no. 1, pp. 1082–1090, 2015. - [288] M. Kompaniiets, O. V. Dobrovolskiy, C. Neetzel, *et al.*, "Long-range superconducting proximity effect in polycrystalline Co nanowires", *Appl. Phys. Lett.*, vol. 104, no. 5, p. 052 603, 2014. - [289] K.-R. Jeon, C. Ciccarelli, H. Kurebayashi, *et al.*, "Effect of meissner screening and trapped magnetic flux on magnetization dynamics in thick nb/ni 80 fe 20/nb trilayers", *Phys. Rev. Appl.*, vol. 11, no. 1, p. 014 061, 2019. - [290] I. A. Golovchanskiy, N. N. Abramov, V. S. Stolyarov, *et al.*, "Ferromagnet/superconductor hybridization for magnonic applications", *Adv. Funct. Mater.*, vol. 28, no. 33, p. 1 802 375, 2018. - [291] O. V. Dobrovolskiy, R. Sachser, T. Brächer, *et al.*, "Magnon–fluxon interaction in a ferromagnet/superconductor heterostructure", *Nature Phys.*, vol. 15, no. 5, pp. 477–482, 2019. - [292] A. Shekhter, L. N. Bulaevskii, and C. D. Batista, "Vortex Viscosity in Magnetic Superconductors Due to Radiation of Spin Waves", *Phys. Rev. Lett.*, vol. 106, no. 3, p. 037 001, 2011. - [293] O. V. Dobrovolskiy, D. Y. Vodolazov, F. Porrati, *et al.*, "Ultra-fast vortex motion in dirty nb-c superconductor with a close-to-perfect edge barrier", *arXiv:2002.08403*, 2020. - [294] D. Lachance-Quirion, S. P. Wolski, Y. Tabuchi, S. Kono, K. Usami, and Y. Nakamura, "Entanglement-based single-shot detection of a single magnon with a superconducting qubit", *Science*, vol. 367, no. 6476, pp. 425–428, 2020. - [295] H. Huebl, C. W. Zollitsch, J. Lotze, *et al.*, "High Cooperativity in Coupled Microwave Resonator Ferrimagnetic Insulator Hybrids", *Phys. Rev. Lett.*, vol. 111, no. 12, p. 127 003, 2013. - [296] Y. Tabuchi, S. Ishino, T. Ishikawa, R. Yamazaki, K. Usami, and Y. Nakamura, "Hybridizing Ferromagnetic Magnons and Microwave Photons in the Quantum Limit", *Phys. Rev. Lett.*, vol. 113, no. 8, p. 083 603, 2014. - [297] X. Zhang, C.-L. Zou, L. Jiang, and H. X. Tang, "Strongly Coupled Magnons and Cavity Microwave Photons", *Phys. Rev. Lett.*, vol. 113, no. 15, p. 156401, 2014. - [298] R. G. E. Morris, A. F. van Loo, S. Kosen, and A. D. Karenowska, "Strong coupling of magnons in a YIG sphere to photons in a planar superconducting resonator in the quantum limit", *Sci. Rep.*, vol. 7, no. 1, p. 11511, 2017. - [299] M. Pfirrmann, I. Boventer, A. Schneider, *et al.*, "Magnons at low excitations: Observation of incoherent coupling to a bath of two-level systems", *Phys. Rev. Res.*, vol. 1, no. 3, p. 032 023, 2019. [300] S. Kosen, A. F. van Loo, D. A. Bozhko, L. Mihalceanu, and A. D. Karenowska, "Microwave magnon damping in YIG films at millikelvin temperatures", *APL Mater.*, vol. 7, no. 10, p. 101 120, 2019. - [301] Y. Li, W. Zhang, V. Tyberkevych, W.-K. Kwok, A. Hoffmann, and V. Novosad, "Hybrid magnonics: Physics, circuits and applications for coherent information processing", *arXiv:2006.16158*, 2020. - [302] D. A. Bozhko, A. A. Serga, P. Clausen, *et al.*, "Supercurrent in a room-temperature bose–einstein magnon condensate", *Nature Phys.*, vol. 12, p. 1057, 2016. - [303] A. A. Serga, V. S. Tiberkevich, C. W. Sandweg, *et al.*, "Bose–einstein condensation in an ultra-hot gas of pumped magnons", *Nature Commun.*, vol. 5, p. 3452, 2014. - [304] M. Schneider, T. Bracher, D. Breitbach, *et al.*, "Bose–Einstein condensation of quasiparticles by rapid cooling", *Nature Nanotechnol.*, vol. 15, no. 6, p. 457, 2020. - [305] D. A. Bozhko, A. A. Serga, P. Clausen, *et al.*, "Supercurrent in a room-temperature Bose–Einstein magnon condensate", *Nature Phys.*, vol. 12, no. 11, p. 1057, Nov. 2016. - [306] D. Suess, A. Bachleitner-Hofmann, A. Satz, *et al.*, "Topologically protected vortex structures for low-noise magnetic sensors with high linear range", *Nature Electron.*, vol. 1, no. 6, pp. 362–370, 2018. - [307] C. Zheng, K. Zhu, S. Cardoso de Freitas, *et al.*, "Magnetoresistive Sensor Development Roadmap (Non-Recording Applications)", *IEEE Trans. Magn.*, vol. 55, no. 4, pp. 1–30, 2019. - [308] M. Inoue, A. Baryshev, H. Takagi, *et al.*, "Investigating the use of magnonic crystals as extremely sensitive magnetic field sensors at room temperature", *Appl. Phys. Lett.*, vol. 98, no. 13, p. 132 511, 2011. - [309] P. Talbot *et al.*, "Electromagnetic sensors based on magnonic crystals for applications in the fields of biomedical and ndt", *Proc. Engin.*, vol. 120, pp. 1241–1244, 2015. - [310] P. J. Metaxas, M. Sushruth, R. A. Begley, *et al.*, "Sensing magnetic nanoparticles using nano-confined ferromagnetic resonances in a magnonic crystal", *Appl. Phys. Lett.*, vol. 106, p. 232 406, 2015. - [311] Y. Cao and P. Yan, "Exceptional magnetic sensitivity of P T -symmetric cavity magnon polaritons", *Phys. Rev. B*, vol. 99, p. 214415, 2019. - [312] S. Atalay, A. O. Kaya, V. S. Kolat, H. Gencer, and T. Izgi, "One-dimensional magnonic crystal for magnetic field sensing", *Journal of Superconductivity and Novel Magnetism*, vol. 28, no. 7, pp. 2071–2075, Jul. 2015. - [313] R. G. Kryshtal and A. V. Medved, "Surface acoustic wave in yttrium iron garnet as tunable magnonic crystals for sensors and signal processing applications", *Applied Physics Letters*, vol. 100, no. 19, p. 192410, 2012. eprint: https://doi.org/10.1063/1.4714507. - [314] M. Balynsky, D. Gutierrez, H. Chiang, *et al.*, "A magnetometer based on a spin wave interferometer", in *Scientific Reports*, 2017. - [315] D. M. Pozar, Microwave Engineering, 4<sup>th</sup>. Hoboken: Wiley, 2012. - [316] V. Harris, "Modern microwave ferrites", *IEEE Trans. Magn.*, vol. 48, pp. 1075–1104, 2012. - [317] J. M. Owens, J. H. Collins, and R. L. Carter, "System applications of magnetostatic wave devices", *Circuits, Systems and Signal Processing*, vol. 4, no. 1, pp. 317–334, Mar. 1985. - [318] J. Helszajn, YIG Resonators and Filters. Chichester: Wiley, 1985. - [319] W. Ishak, "Magnetostatic wave technology: A review", *Proc. IEEE*, vol. 76, no. 2, pp. 171–187, 1988. - [320] J. D. Adam, "Analog signal processing with microwave magnetics", *Proceedings of the IEEE*, vol. 76, no. 2, pp. 159–170, Feb. 1988. - [321] H. Tanbakuchi, D. Nicholson, B. Kunz, and W. Ishak, "Magnetically tunable oscillators and filters", *IEEE Trans. Magn.*, vol. 25, no. 5, pp. 3248–3253, 1989. - [322] J. S. McLean, "A re-examination of the fundamental limits on the radiation q of electrically small antennas", *IEEE Trans. Antennas Propag.*, vol. 44, p. 672, 1996. - [323] H. A. Wheeler, "Fundamental limitations of small antennas", *Proc. IEEE*, vol. 35, pp. 1479–1484, 1947. - [324] J. C.-E. Sten, A. Hujanen, and P. K. Koivisto, "Quality factor of an electrically small antenna radiating close to a conducting plane", *IEEE Trans. Antennas Propag.*, vol. 49, pp. 829–837, 2001. - [325] D. M. Pozar, "Microstrip antennas", *Proc. IEEE*, vol. 80, pp. 79–81, 1992. - [326] Z. Yao, Y. Wang, S. Keller, and G. P. Carman, "Bulk acoustic wave-mediated multiferroic antennas: Architecture and performance bound", *IEEE Trans. Antennas Propag.*, vol. 63, pp. 3335–3344, 2015. - [327] J. P. Domann and G. P. Carman, "Strain powered antennas", J. Appl. Phys., vol. 121, p. 044 905, 2017. - [328] J. Xu, C. M. Leung, X. Zhuang, *et al.*, "A low frequency mechanical transmitter based on magnetoelectric heterostructures operated at their resonance frequency", *Sensors*, vol. 19, E853, 2019. - [329] R. L. Kubena, X. Pang, K. G. Lee, Y. K. Yong, and W. S. Wall, "Wide-band multiferroic quartz mems antennae", *J. Phys. Conf. Ser.*, vol. 1407, p. 012 026, 2019. - [330] T. Nan, H. Lin, Y. Gao, *et al.*, "Acoustically actuated ultra-compact nems magnetoelectric antennas", *Nature Commun.*, vol. 8, p. 1, 2017. - [331] R. V. Petrov, A. S. Tatarenko, S. Pandey, G. Srinivasan, J. V. Mantese, and R. Azadegan, "Miniature antenna based on magnetoelectric composites", *Electron. Lett.*, vol. 44, pp. 506–508, 2008. - [332] M. Manteghi and A. A. Y. Ibraheem, "On the study of the near-fields of electric and magnetic small antennas in lossy media", *IEEE Trans. Antennas Propag.*, vol. 62, p. 6491, 2014. [333] T. Seifert, S. Jaiswal, U. Martens, *et al.*, "Efficient metallic spintronic emitters of ultrabroadband terahertz radiation", *Nature Phot.*, vol. 10, no. 7, pp. 483–488, 2016. - [334] T. Jungwirth, X. Marti, P. Wadley, and J. Wunderlich, "Antiferromagnetic spintronics", *Nature Nanotechnol.*, vol. 11, no. 3, pp. 231–241, 2016. - [335] V. Baltz, A. Manchon, M. Tsoi, T. Moriyama, T. Ono, and Y. Tserkovnyak, "Antiferromagnetic spintronics", *Rev. Mod. Phys.*, vol. 90, p. 015 005, 2018. - [336] C. Kittel, "Theory of antiferromagnetic resonance", *Phys. Rev.*, vol. 82, pp. 565–565, 1951. - [337] T. Kampfrath, A. Sell, G. Klatt, *et al.*, "Coherent terahertz control of antiferromagnetic spin waves", *Nature Photon.*, vol. 5, no. 1, pp. 31–34, 2011. - [338] D. Bossini, S. Dal Conte, Y. Hashimoto, *et al.*, "Macrospin dynamics in antiferromagnets triggered by sub-20 femtosecond injection of nanomagnons", *Nature Commun.*, vol. 7, no. 1, p. 10645, 2016. - [339] K. Grishunin, T. Huisman, G. Li, *et al.*, "Terahertz Magnon-Polaritons in TmFeO3", *ACS Photon.*, vol. 5, no. 4, pp. 1375–1380, 2018. - [340] S. S. Dhillon, M. S. Vitiello, E. H. Linfield, *et al.*, "The 2017 terahertz science and technology roadmap", *J. Phys. D: Appl. Phys.*, vol. 50, no. 4, p. 043 001, 2017. - [341] K. Zakeri, "Terahertz magnonics: Feasibility of using terahertz magnons for information processing", *Physica C*, vol. 549, pp. 164–170, 2018. - [342] A. V. Kimel, A. Kirilyuk, A. Tsvetkov, R. V. Pisarev, and T. Rasing, "Laser-induced ultrafast spin reorientation in the antiferromagnet TmFeO 3", *Nature*, vol. 429, no. 6994, pp. 850–853, 2004. - [343] B. G. Park, J. Wunderlich, X. Martí, *et al.*, "A spin-valve-like magnetoresistance of an antiferromagnet-based tunnel junction", *Nature Mater.*, vol. 10, no. 5, pp. 347–351, 2011. - [344] P. Wadley, B. Howells, J. Železný, *et al.*, "Electrical switching of an antiferromagnet", *Science*, vol. 351, no. 6273, pp. 587–590, 2016. - [345] S. Y. Bodnar, L. Šmejkal, I. Turek, *et al.*, "Writing and reading antiferromagnetic Mn 2 Au by Néel spin-orbit torques and large anisotropic magnetoresistance", *Nature Commun.*, vol. 9, no. 1, p. 348, 2018. - [346] T. Devolder, J.-V. Kim, F. Garcia-Sanchez, *et al.*, "Time-resolved spin-torque switching in mgo-based perpendicularly magnetized tunnel junctions", *Phys. Rev. B*, vol. 93, p. 024 420, 2 Jan. 2016. - [347] A. Mahmoud, C. Adelmann, F. Vanderveken, S. Cotofana, F. Ciubotaru, and S. Hamdioui, "Fan-out of 2 triangle shape spin wave logic gates", in *2021 Design, Automation Test in Europe Conference Exhibition (DATE)*, 2021, pp. 948–953. - [348] O. Zografos, B. Sorée, A. Vaysset, *et al.*, "Design and benchmarking of hybrid cmosspin wave device circuits compared to 10 nm cmos", in *2015 IEEE 15th International Conference on Nanotechnology (IEEE-NANO)*, IEEE, 2015, pp. 686–689. [349] Y. Chen, A. Sangai, M. Gholipour, and D. Chen, "Schottky-barrier-type graphene nano-ribbon field-effect transistors: A study on compact modeling, process variation, and circuit performance", in *2013 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)*, Jul. 2013, pp. 82–88. - [350] L. W. Cotten, "Circuit implementation of high-speed pipeline systems", in *Proceedings of the November 30–December 1, 1965, Fall Joint Computer Conference, Part I*, ser. AFIPS '65 (Fall, part I), Las Vegas, Nevada: Association for Computing Machinery, 1965, pp. 489–504. - [351] M. J. Flynn, "Very high-speed computing systems", *Proceedings of the IEEE*, vol. 54, no. 12, pp. 1901–1909, 1966. - [352] A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana, and S. Hamdioui, "Spin wave normalization toward all magnonic circuits", *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 68, no. 1, pp. 536–549, 2021. - [353] S. Cherepov, P. Khalili Amiri, J. G. Alzate, *et al.*, "Electric-field-induced spin wave generation using multiferroic magnetoelectric cells", *App. Physics Letters*, vol. 104, no. 8, p. 082 403, 2014. eprint: https://doi.org/10.1063/1.4865916. - [354] F. Ciubotaru, T. Devolder, M. Manfrini, C. Adelmann, and I. P. Radu, "All electrical propagating spin wave spectroscopy with broadband wavevector capability", *Applied Physics Letters*, vol. 109, no. 1, p. 012403, 2016. eprint: https://doi.org/10.1063/1.4955030. - [355] G. Talmelli, F. Ciubotaru, K. Garello, *et al.*, "Spin-wave emission by spin-orbit-torque antennas", *Phys. Rev. Applied*, vol. 10, p. 044 060, 4 Oct. 2018. - [356] B. A. Kalinikos and A. N. Slavin, "Theory of dipole-exchange spin wave spectrum for ferromagnetic films with mixed exchange boundary conditions", *Journal of Physics C: Solid State Physics*, vol. 19, no. 35, pp. 7013–7033, Dec. 1986. - [357] T. F. Canan *et al.*, "Ultracompact and low-power logic circuits via workfunction engineering", *IEEE Journal on Exploratory Solid-State Computational Devices and Circuits*, vol. 5, no. 2, pp. 94–102, 2019. - [358] S. Matsunaga, J. Hayakawa, S. Ikeda, *et al.*, "Fabrication of a nonvolatile full adder based on logic-in-memory architecture using magnetic tunnel junctions", *Applied Physics Express*, vol. 1, p. 091 301, Aug. 2008. - [359] H. Cai, Y. Wang, L. A. De Barros Naviner, and W. Zhao, "Robust ultra-low power non-volatile logic-in-memory circuits in fd-soi technology", *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 4, pp. 847–857, 2017. - [360] A. Roohi, R. Zand, D. Fan, and R. F. DeMara, "Voltage-based concatenatable full adder using spin hall effect switching", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 36, no. 12, pp. 2134–2138, 2017. - [361] A. Roohi, R. Zand, and R. F. DeMara, "A tunable majority gate-based full adder using current-induced domain wall nanomagnets", *IEEE Transactions on Magnetics*, vol. 52, no. 8, pp. 1–7, 2016. [362] S. Angizi, H. Jiang, R. F. DeMara, J. Han, and D. Fan, "Majority-based spin-cmos primitives for approximate computing", *IEEE Transactions on Nanotechnology*, vol. 17, no. 4, pp. 795–806, 2018. - [363] M. Mewada *et al.*, "Transmission gate and hybrid cmos full adder characterization and power-delay product estimation based on mathematical model", *Procedia Computer Science*, vol. 171, pp. 999–1008, 2020, Third International Conference on Computing and Network Communications (CoCoNet'19). - [364] B. Parhami, *Computer arithmetic: Algorithms and hardware designs*, ser. Computers & Mathematics with Applications. Oxford University Press; 2nd edition, 2009. - [365] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, "Design and analysis of approximate compressors for multiplication", *IEEE Trans. on Computers*, vol. 64, no. 4, pp. 984–994, 2015. - [366] J. Mori *et al.*, "A 10 ns 54\*54 b parallel structured full array multiplier with 0.5 mu m cmos technology", *IEEE Journal of Solid-State Circuits*, vol. 26, no. 4, pp. 600–606, 1991. - [367] M. Kumar and J. Nath, "Design of an energy efficient 4-2 compressor", *IOP Conference Series: Materials Science and Engineering*, vol. 225, p. 012 136, Aug. 2017. - [368] F. Ranjbar, Y. Forghani, and D. Bahrepour, "High performance 8-bit approximate multiplier using novel 4: 2 approximate compressors for fast image processing", *International Journal of Integrated Engineering*, vol. 10, no. 1, 2018. - [369] A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana, and S. Hamdioui, "Spin wave based full adder", in *2021 IEEE International Symposium on Circuits and Systems (ISCAS)*, 2021, pp. 1–5. - [370] A. Arasteh *et al.*, "An energy and area efficient 4:2 compressor based on finfets", *Integration*, vol. 60, pp. 224–231, 2018. - [371] G. K. Wallace, "The jpeg still picture compression standard", *IEEE Trans. on Consumer Electronics*, vol. 38, no. 1, pp. xviii–xxxiv, 1992. - [372] T. F. Canan, S. Kaya, A. Karanth, and A. Louri, "Ultracompact and low-power logic circuits via workfunction engineering", *IEEE Journal on Exploratory Solid-State Computational Devices and Circuits*, vol. 5, no. 2, pp. 94–102, 2019. - [373] V. Gupta, D. Mohapatra, S. P. Park, and K. Raghunathan Anand Roy, "Impact: Imprecise adders for low-power approximate computing", in *IEEE/ACM International Symposium on Low Power Electronics and Design*, 2011, pp. 409–414. - [374] K. Manikantta Reddy *et al.*, "Design and analysis of multiplier using approximate 4-2 compressor", *AEU International Journal of Electronics and Communications*, vol. 107, pp. 89–97, 2019. - [375] P. Kulkarni *et al.*, "Trading accuracy for power with an underdesigned multiplier architecture", in *2011 24th Internatioal Conference on VLSI Design*, 2011, pp. 346–351. [376] A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "4-output programmable spin wave logic gate", in *2020 IEEE 38th International Conference on Computer Design (ICCD)*, 2020, pp. 332–335. - [377] A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "Achieving wave pipelining in spin wave technology", in *2021 22nd International Symposium on Quality Electronic Design (ISQED)*, 2021, pp. 54–59. - [378] A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui, and S. Cotofana, "Spin wave based approximate computing", *IEEE Trans. on Emerging Topics in Computing*, pp. 1–1, 2021. - [379] A. N. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Hamdioui, and S. Cotofana, "A spin wave-based approximate 4:2 compressor: Seeking the most energy-efficient digital computing paradigm.", *IEEE Nanotechnology Magazine*, pp. 2–11, 2021. - [380] S. Cotofana and S. Vassiliadis, "Signed digit addition and related operations with threshold logic", *IEEE Trans. Computers*, vol. 49, pp. 193–207, 2000. - [381] ——, "Periodic symmetric functions, serial addition, and multiplication with neural networks", *IEEE Transactions on Neural Networks*, vol. 9, no. 6, pp. 1118–1128, 1998. - [382] ——, "Periodic symmetric functions with feed-forward neural networks", in *NEU-RAP'95/96 NeuralNetworks and their Applications*, 1995, pp. 215–221. - [383] N. M. Ebeid and M. Hasan, "On binary signed digit representations of integers", *Designs, Codes and Cryptography*, vol. 42, pp. 43–65, 2007. - [384] M. Naik V, "Performance analysis of parallel prefix adder", *International Journal of Electrical Electronics and Data Communication*, vol. 3, Jul. 2015. - [385] V. Pudi *et al.*, "Majority logic formulations for parallel adder designs at reduced delay and circuit complexity", *IEEE Transactions on Computers*, vol. 66, no. 10, pp. 1824–1830, 2017. - [386] F. Ciubotaru, G. Talmelli, T. Devolder, *et al.*, "First experimental demonstration of a scalable linear majority gate based on spin waves", in *2018 IEEE International Electron Devices Meeting (IEDM)*, 2018, pp. 36.1.1–36.1.4. - [387] O. Zografos, B. Sorée, A. Vaysset, *et al.*, "Design and benchmarking of hybrid cmosspin wave device circuits compared to 10nm cmos", in *2015 IEEE 15th International Conference on Nanotechnology (IEEE-NANO)*, 2015, pp. 686–689. - [388] D. Tierno, F. Ciubotaru, R. Duflou, M. Heyns, I. P. Radu, and C. Adelmann, "Strain coupling optimization in magnetoelectric transducers", *Microelectron. Engin.*, vol. 187-188, pp. 144–147, 2018. - [389] S. Dutta, R. Iraei, C. Pan, *et al.*, "Impact of spintronics transducers on the performance of spin wave logic circuit", in *2016 IEEE 16th Intern. Conf. on Nano.* (*IEEE-NANO*), 2016, pp. 990–993. - [390] S. Dutta, D. Nikonov, S. Manipatruni, I. A. Young, and A. Naeemi, "Compact physical model for crosstalk in spin-wave interconnects", *IEEE Transactions on Electron Devices*, vol. 62, no. 11, pp. 3863–3869, 2015. [391] E. Egel, C. Meier, G. Csaba, and S. Breitkreutz-von Gamm, "Design of a CMOS integrated on-chip oscilloscope for spin wave characterization", *AIP Adv.*, vol. 7, no. 5, p. 056 016, 2017. - [392] S. Dutta, D. E. Nikonov, S. Manipatruni, I. A. Young, and A. Naeemi, "Spice circuit modeling of pma spin wave bus excited using magnetoelectric effect", *IEEE Transactions on Magnetics*, vol. 50, no. 9, pp. 1–11, Sep. 2014. - [393] D. Tierno, M. Dekkers, P. Wittendorp, *et al.*, "Microwave characterization of basubstituted pzt and zno thin films", *IEEE Trans. Ultrason. Ferroelectr. Freq. Control*, vol. 65, no. 5, pp. 881–888, May 2018.