

**Delft University of Technology** 

## A Self-Matching Complementary-Reference Sensing Scheme for High-Speed and Reliable Toggle Spin Torque MRAM

Wang, Jinkai; Lian, Chenyu; Bai, Yining; Wang, Guanda; Zhang, Zhizhong; Zheng, Zhenyi; Zhang, Kun; Cotofana, Sorin; Zhang, Yue; More Authors **DOI** 

10.1109/TCSI.2020.3020137

Publication date

**Document Version** Final published version

Published in IEEE Transactions on Circuits and Systems I: Regular Papers

## Citation (APA)

Wang, J., Lian, C., Bai, Y., Wang, G., Zhang, Z., Zheng, Z., Zhang, K., Cotofana, S., Zhang, Y., & More Authors (2020). A Self-Matching Complementary-Reference Sensing Scheme for High-Speed and Reliable Toggle Spin Torque MRAM. *IEEE Transactions on Circuits and Systems I: Regular Papers*, *67*(12), 4247-4258. Article 9190048. https://doi.org/10.1109/TCSI.2020.3020137

## Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

#### Copyright

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

# A Self-Matching Complementary-Reference Sensing Scheme for High-Speed and Reliable Toggle Spin Torque MRAM

Jinkai Wang, Student Member, IEEE, Chenyu Lian, Yining Bai, Guanda Wang, Graduate Student Member, IEEE, Zhizhong Zhang, Student Member, IEEE, Zhenyi Zheng, Graduate Student Member, IEEE, Lei Chen, Kelian Lin,

Kun Zhang, Member, IEEE, Youguang Zhang, Xiulong Wu<sup>10</sup>, Member, IEEE, Sorin Cotofana<sup>10</sup>, Fellow, IEEE,

and Yue Zhang<sup>(D)</sup>, Senior Member, IEEE

Abstract-While spintronic memories, for example, spin transfer torque magnetic random access memory (STT-MRAM), have shown huge potential for building next-generation memory due to their attractive characteristics, the relatively large write latency and deficient read mechanism preclude their further application for emerging concepts, such as in-memory-processing and neuromorphic computing. A toggle spin torque (TST) MRAM combining STT and spin orbit torque (SOT) has recently been proposed to alleviate the write issue. However, the sensing featuring a good balance between the reliability and speed has not been addressed. In this paper, we propose a self-matching complementary-reference (SMCR) sensing scheme, which provides not only a maximum sensing margin (SM) but also a high-speed read operation. Through applying it in the TST-MRAM, advantageous performance in terms of both write and read processes can be realized. To validate the functionality of our proposal, we design and evaluate an 8Kb TST-MRAM array, in which a read delay of 1 ns and a read bit error rate (BER) of  $1.02 \times 10^{-13}$  are achieved. Moreover, when being operated at 0.8 V supply voltage, it can reduce the read access energy by 7.5% and 20%, compared with conventional voltage sensing and dynamic reference sensing schemes, respectively.

Manuscript received July 16, 2020; accepted August 20, 2020. Date of publication September 9, 2020; date of current version December 1, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61971024 and Grant 51901008, in part by the Young Elite Scientist Sponsorship Program by CAST under Grant 2017QNRC001, in part by the International Mobility Project under Grant B16001, and in part by the National Key Technology Program of China under Grant 2017ZX01032101. This article was recommended by Associate Editor W. Liu. (*Corresponding author: Yue Zhang.*)

Jinkai Wang, Guanda Wang, Zhizhong Zhang, Zhenyi Zheng, Lei Chen, Kelian Lin, Kun Zhang, and Youguang Zhang are with the Fert Beijing Institute, School of Microelectronics, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China.

Chenyu Lian, Yining Bai, and Yue Zhang are with the Fert Beijing Institute, School of Microelectronics, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China, and also with the Nanoelectronics Science and Technology Center, Hefei Innovation Research Institute, Beihang University, Hefei 230013, China (e-mail: yz@buaa.edu.cn).

Xiulong Wu is with the School of Electronics and Information Engineering, Anhui University, Hefei 230601, China.

Sorin Cotofana is with the Department of Quantum and Computer Engineering, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, 2628 Delft, The Netherlands.

Color versions of one or more of the figures in this article are available online at https://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2020.3020137

*Index Terms*—MRAM, toggle spin torque, sensing margin, self-matching, read bit error.

### I. INTRODUCTION

PIN transfer torque magnetic random access memory (STT-MRAM) is widely considered as a high potential candidate for future memory system designs, thanks to its intrinsic non-volatility, high endurance, low power consumption and good compatibility with existing complementary metal oxide semiconductor (CMOS) technology [1]-[3]. For example, a lot of STT-MRAM based cache designs have been brought out to address large bit cell size and static power dissipation issues of static RAM (SRAM) [4]–[6]. Recently, STT-MRAM is also gaining popularity in burgeoning big-data-driven applications. Various emerging computing concepts based on spintronic devices, such as in-memory-processing and neuromorphic computing, have recently been proposed to overcome Von-Neuman bottleneck and "memory wall" [7]–[10].

As shown in Fig. 1(a), STT-MRAM requires a rather long magnetization switching time and its write mechanism might cause the degradation of write endurance (e.g. 10<sup>12</sup> cycles) [11]-[13], which have become an obstacle to further development of MRAM based computing concepts. Spin-orbit torque MRAM (SOT-MRAM) solves these two drawbacks by using three-terminal device structure, as shown in Fig. 1(b) [14]-[16]. However, as SOT effect suffers from the symmetric properties with respect to the perpendicular direction, it cannot achieve the deterministic switching of the magnetic tunnel junction (MTJ) with perpendicular magnetic anisotropy (PMA). An additional magnetic field is normally required, which is highly undesirable for the practical integration [17]–[19]. Some field-free solutions have thus been proposed, such as lateral structural asymmetry, antiferromagnetic (AFM) metal, voltage control magnetic anisotropy (VCMA) effect, etc. Recently, toggle spin torque MRAM (TST-MRAM) has been proposed to improve the write operation and realize field-free switching [20], [21], as depicted in Fig. 1(c). The write operation of TST-MRAM makes use of two currents: (i) one flowing through the heavy metal by  $T_{HM}$  transistor; (ii) the other injected into MTJ by  $T_{MTJ}$ transistor whose flowing direction determines the MTJ state,

1549-8328 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 1. (a) STT-MRAM. (b) SOT-MRAM. (c) TST-MRAM. (d) Principle of write operation for TST-MRAM. (e) Schematic diagram of TST-MRAM bit cell.

i.e., flowing from free layer (FL) to pinned layer (PL) to write '0' (low resistance,  $R_L$ ) and from PL to FL to write '1' (high resistance,  $R_H$ ), as shown in Fig.1(d). Fig. 1(e) shows the schematic diagram of a standard 2T1R bit cell in TST-MRAM. Although the integration density of TST-MRAM cannot be comparable to the two-terminal devices and the control process of write operation is relatively complex in TST-MRAM, theoretical studies and experimental data confirm that TST-MRAM can provide lower switching latency and smaller energy dissipation compared with STT-MRAM and SOT-MRAM [21].

However, as TST-MRAM read operation still adopts the mainstream approach of STT-MRAM, the read current can cause the possible unintentionally magnetization flipping during read operation due to the thermal fluctuation effect [22]. In addition, a part of current flowing through the heavy metal layer under the MTJ can also induce the torque on the FL [23], increasing the flipping probability during read operation. A small read current is conducive to reducing this read disturbance but will result in a lower sensing margin (SM) in sensing scheme. Hence, the tradeoff between read disturbance and SM makes the design of sensing scheme more challenging. Moreover, SM is reduced with feature size downscaling and process variations of device, which can seriously affect read operation and even cause read errors. Improving the tunneling magnetoresistance ratio (TMR) of MTJ is an effective method to increase SM, but the TMR value is only about 60% to 300% at room temperature owing to the limitations of materials, structures and process technologies [24].

To alleviate the above-mentioned issues, various sensing schemes have been introduced in [25]–[31]. [26] proposed a data-cell-variation-tolerant dual-mode sensing method to improve the fundamental read reliability while mitigating performance and energy overheads, however, it maintains read access time at around 8 ns. [27] introduced a time-based sensing scheme, which converts the bit line voltage into time-domain to discriminate the datum in bit cell. This effectively improves read reliability, but complex timing schedule extends the read access time. In [28], a continuous-recording-and-enhancement voltage (CREV) sense amplifier (SA) can tolerate smaller *SM* while enabling the read latency down to 1.3 ns at 0.9 V supply voltage. However, it requires two MTJs with complementary states to store a bit, which increases the write difficulties and degrades the storage capacity. In addition to the above approaches, the *SM* can be widened by single-cap offset-cancelled (SCOC) SA [29] and dynamically adjusting the reference voltage [30]. However, these methods sacrifice the read access time, for their reference voltage generation and the dispatch process are more complex compared with those of conventional schemes.

In this paper, we propose a self-matching complementaryreference (SMCR) sensing scheme to effectively reduce the read operation latency while improving the reliability for TST-MRAM. Beyond the efficient write operation, SMCR scheme can directly match a reference voltage complementing the sensed voltage  $(V_S)$  from two candidate reference voltages  $(V_{RL} \text{ and } V_{RH})$  instead of multiple comparisons, which can optimize the sense complexity and reduce the read access time. Besides, as voltage difference between  $V_S$  and the matched reference voltage is almost equal to  $V_{RH}$ - $V_{RL}$ , a more reliable data sensing with large SM can be realized compared with the mainstream scheme of single reference. To validate the functionality of SMCR sensing scheme, we design and evaluate an 8Kb TST-MRAM array. The read access and reliability performance have been analysed by both theoretical calculations and simulations. Monte Carlo simulations have also been carried out to demonstrate the influence of manufacturing process on SM. Compared with state-of-the-art sensing scheme counterparts, the proposed SMCR scheme can offer a shorter read delay (1 ns), a lower read bit error rate (BER)  $(1.02 \times 10^{-13})$  and a lower read energy.

The remained part of this paper is organized as follows: Section II describes the SMCR scheme with its operation principle and the simulation framework. Read access time and reliability of SMCR scheme are analysed in Section III. Section IV presents simulation results and compares our proposal with previously reported sensing schemes of MRAM. Conclusions are presented in Section V.

### II. SMCR SCHEME AND 8KB TST-MRAM ARRAY

In the conventional MRAM sensing scheme using a single reference resistance  $R_{ref}$ , the value of  $R_{ref}$  is normally the average of  $R_L$  and  $R_H$ . The reference voltage  $V_{ref}$  can thus be obtained as  $(V_H + V_L)/2$ , where  $V_H$  ( $V_L$ ) is the sensed voltage when the MTJ in bit cell is  $R_H$  ( $R_L$ ). SM is defined by the maximum deviation between the sensed voltage and the reference voltage, which can quantify the reliability of voltage sensing scheme [25]. Hence, the SM of conventional MRAM is invariably given as

$$SM = V_H - V_{ref} = V_{ref} - V_L = (V_H - V_L)/2$$
(1)

In the proposed SMCR sensing scheme,  $V_{ref}$  can be dynamically adjusted to  $V_{RL}$  or  $V_{RH}$  according to the  $V_S$  value to enlarge *SM*. Here,  $V_{RL} = V_L$  and  $V_{RH} = V_H$ . Consequently, *SM* of SMCR sensing scheme can be expressed as

$$SM = |V_S - V_{ref}| = V_H - V_{RL} = V_{RH} - V_L = V_H - V_L \qquad (2)$$



Fig. 2. (a) Schematic of the proposed SMCR sensing scheme. (b) Timing sequences of SMCR sensing scheme.

#### A. SMCR Scheme Design and Timing Analysis

Fig. 2(a) presents the schematic of SMCR sensing scheme, mainly including reference voltage generation circuit and SMCR-SA. Note that dual-reference bit cells are added on a column and connected to reference bit lines (i.e.  $R_HBL$ and  $R_LBL$ ); the storage bit cells are connected to read bit line (i.e. RBL). Since the dual-reference cells have the same structure as storage bit cells and are located in same column, the resistive and capacitive variations are well tracked and compensated while reading '0' and '1' [29]. As there are more transistors connected to RBL compared with  $R_HBL$ and  $R_LBL$ , the parasitic capacitances of RBL is larger than those of  $R_HBL$  and  $R_LBL$ . To eliminate the large difference



Fig. 3. Waveforms of RBL,  $R_HBL$ ,  $R_LBL$  and SMCR-SA output in SMCR sensing scheme. (a) Read '0'. (b) Read '1'. (c) SMCR-SA output when reading '0'. (d) SMCR-SA output when reading '1'.

of parasitic capacitances, two load capacitances of C0 and C1 are added on  $R_HBL$  and  $R_LBL$ . Note that the sizes of the two load capacitances depend on the parasitic capacitances of RBL. Fig. 2(b) presents waveforms and timing sequences illustrating SMCR scheme basic operation principle, which includes three phases: (1) bit lines pre-charging, (2) voltages difference formation and (3) SMCR-SA amplification.

At the beginning of SMCR read operation, three bit lines RBL, R<sub>H</sub>BL and R<sub>L</sub>BL are pre-charged to read supply voltage VDDR, while the transistors T0 and T1 are applied to balance these voltages for getting the same voltage level at the end of phase (1). At the same time, the source line (SL) is connected to the ground by activating the Dis signal, and RD signal is set to VDDR for connecting RBL, R<sub>H</sub>BL and R<sub>L</sub>BL to the inputs of the proposed SMCR-SA, i.e. INR, INRH and INRL. Phase (2) is initiated by setting PRE signal to VDDR, which turns off the charging circuit while RBL, R<sub>H</sub>BL and R<sub>L</sub>BL preserve their status due to the parasitic capacitances of bit line. By activating WL0/WL1/WL <i> to turn on the transistors in dual-reference bit cells and datum bit cell, RBL, R<sub>H</sub>BL and R<sub>L</sub>BL voltages start to decrease and their drop speeds depend on the discharge channel parameters. After a period of time, variations among voltages of the three bit lines become obvious owing to the resistance differences. We carry out the Monte Carlo simulations of 10<sup>5</sup> samples to demonstrate the time-dependent variations of RBL, R<sub>H</sub>BL, R<sub>L</sub>BL and SMCR-SA output. In these Monte Carlo simulations, the process deviation of CMOS and the resistance of MTJ follow a Gaussian distribution with 5% variability [30], [32]. As depicted in Fig. 3(a) and (b), when the voltage difference reaches about 100 mV, the SMCR-SA is activated by the signal SAE. This voltage difference is defined as the SM of SMCR sensing scheme, which should be large enough for overcoming intrinsic PVT variation and the degradation on the flipping ability of latch (composing P0, P1, N4 and N5) and satisfying the requirement of amplification accuracy [33]-[35].

In the SMCR sensing scheme, SMCR-SA can implement the dynamically adjustment of  $V_{ref}$ . N1 and N3 gate voltages are  $V_{RL}$  and  $V_{RH}$ , respectively, and N0 and N2 gate voltages are both  $V_{S}$ . Note that the nodes OUT and OUTB have been pre-charged to VDDR at the end of phase (1). After the bit line voltages drop for a period of time in the phase (2), SMCR-SA begins to work by activating the enable signal SAE and phase (3) starts. OUT and OUTB voltages then discharge through N6, but their discharge rates are different because the transistor gate voltages on the corresponding discharge channels are different. Assume that the read datum is '0', and  $V_S$  is almost equal to  $V_{RL}$ . In this case,  $V_S$  and  $V_{RH}$ can form a pair of complementary voltages, which means that the  $V_{ref}$  is adjusted to  $V_{RH}$ . The NO and N1 have the same gate voltages and the gate voltage of N3 is greater than that of N2. As a result, the current through N3 is larger than that through N2, which causes OUTB voltage to first decrease to  $VDDR - |V_{THP}|$  ( $V_{THP}$  is PMOS threshold voltage). This turns on PO and the initiates charging process of the OUT signal. As the discharge of OUT is still faster than its charge at that moment, its voltage continues to drop to  $VDDR - |V_{THP}|$ . Similarly, OUTB voltage continues to drop below  $V_{THN}$  ( $V_{THN}$  is NMOS threshold voltage), then N4 turns off and OUT voltage quickly rises through charging via the transistor PO [33]. Subsequently, by leveraging the back-to-back inverters (i.e. P0, P1, N4 and N5) feedback, OUTB voltage decreases and OUT voltage rises until they reach '0' and '1' levels, respectively, as exhibited in Fig. 3(c). Similarly, when read datum is '1',  $V_S$  is almost the same as  $V_{RH}$ .  $V_S$  and  $V_{RL}$  then form a pair of complementary voltages, which means that the  $V_{ref}$  is adjusted to  $V_{RL}$ . N3 and N2 have the same gate voltages and the gate voltage of N0 is greater than that of N1. Fig. 3(d) shows the output of the SMCR-SA in this case, in which the OUTB will eventually be '1'. The above analysis demonstrates the self-matching capability of the SMCR scheme whose basics are to directly compare  $V_{RH}$ - $V_S$ with  $V_S$ - $V_{RL}$  and select the maximum between them. However, the achievement of self-matching capability causes the degradation on the flipping ability of latch due to the addition of the sampling transistors N0 and N1. In order to alleviate this degradation, we set a large enough input voltage difference and increase the size of transistors in SMCR-SA. Compared to conventional voltage SA, the input voltage difference and area of SMCR-SA increases 20 mV and 0.11  $\mu$ m<sup>2</sup>, respectively.

#### B. Design of the 8Kb TST-MRAM

Fig. 4(a) presents a detailed block diagram of an 8Kb TST-MRAM array, which contains 8 blocks, each with 128×8 cells. There are write and read drivers, row and column decoders, a level converter, a local timing control. To enhance the parallelism and avoid the affect between each column, each column is designed with an independent SMCR structure. In a read operation, a block of TST-MRAM can simultaneously read data from 8 bit cells. In addition, a voltage conversion from write supply voltage VDDW (1.5 V) to VDDR (0.8 V) can be realized through the level converter, which not only saves the energy during write and read operations but also alleviates read disturbance owing to the smaller current at low voltage.



Fig. 4. (a) Detailed block diagram of an 8Kb TST-MRAM array with SMCR sensing scheme. (b) Schematic of the internal organization of a block.

The internal organization of the array is shown in Fig. 4(b). In the write operation, four control signals (WPA, WPB, WNA and WNB) are generated by the RD/WR driver to control two PMOS and two NMOS transistors, respectively. Following the timing sequence plotted in Fig. 5, for writing a datum '1', WPA drops to 0 V to turn on the connected PMOS transistor, and WNA rises to VDDW to open the connected NMOS transistor. The PMOS and NMOS transistors controlled by WPB and WNB signals are turned off. The signals



Fig. 5. Timing sequences of write and read operations for TST-MRAM array.

WWL<i> and WL<i> are activated according to the aforementioned switching mechanism of TST-MRAM. Similarly, for writing a datum '0', WPB and WNA are set to zero, WPA and WNB are set to VDDW. WWL<i> and WL<i> are then specially configured to induce reverse write currents flowing through the heavy metal and MTJ, respectively. The write operation is also performed for dual-reference cells to initialize their states. In the read operation, two PMOS and two NMOS transistors controlled by WPA, WPB, WNA and WNB are all shut, while the read control signals work as the timing sequence in Fig. 2(b).

Fig. 6 demonstrates the transient simulation results of entire write/read process for 1 bit among 8 bits in a block array. Through properly applying SOT and STT currents, the state of MTJ can be switched from '0' to '1' and then back to '0', as shown in Fig. 6(a). Here, the maximum current of SOT is 885  $\mu$ A for writing '1' or 979  $\mu$ A for writing '0' and STT maximum current is 121  $\mu$ A for writing '1' or 266  $\mu$ A for writing '0', as indicated in Fig. 6(b) and (c). Note that, although the SOT current flowing in the heavy metal is relatively large, the total power consumption will effectively be reduced due to the short time that it exists for, e.g. 1 ns.

The waveforms of the bit lines, namely RBL,  $R_HBL$  and  $R_LBL$ , are displayed in Fig. 6(d), (e) and (f). It is found that RBL provides different voltage levels according to the varied state of MTJ. In the read operation, the different voltage drops of these three bit lines can be observed. Based on them, the expected *SM* can also be obtained. Fig. 6(g), (h) and (i) illustrate the waveforms of the nodes OUTB, OUT and Output of a column in the array. The datum stored in the TST-MRAM array can successfully be read out within 1 ns.

### III. THEORETICAL ANALYSES OF DELAY AND RELIABILITY

#### A. Delay Analysis

During the voltage difference formation phase, the discharges of  $R_HBL$ ,  $R_LBL$  and RBL can be captured by the RC



Fig. 6. Transient simulation results of TST-MRAM with SMCR sensing scheme. (a) MTJ states. (b) SOT current. (c) STT and read currents. (d)-(f) Waveforms of R<sub>H</sub>BL, R<sub>L</sub>BL and RBL. (g)-(h) Waveforms of OUT and OUTB of SMCR-SA output nodes. (i) Output of a column in the array. ③ SOT write phase. ③ SOT and STT write phase. ③ SOT write phase. ④ Bit lines and SMCR-SA pre-charge phase. ⑤ Voltage difference formation phase. ⑥ SMCR-SA amplification phase.

circuit model. The bit line voltage change can be expressed as

T

$$V_t = V_0 e^{-\frac{I_{dis}}{RC}} \tag{3}$$

where  $T_{dis}$  is the discharge time and  $V_0$  is the initial voltage for the bit line. Note that the following discussion is based on the case of reading a datum '0',  $V_S$  and  $V_{RH}$  will thus form a pair of complementary voltages. For the RC circuit model, R is the total resistance of the discharge channel, including MTJ resistance, wire parasitic resistance ( $R_{WR}$ )



Fig. 7. Dependence of  $T_{dis}$  on  $\Delta V$  with different  $V_0$  and parasitic capacitances.

and the equivalent resistance of all transistors on discharge channel ( $R_{TR}$ ). Similarly, *C* includes the parasitic capacitances of wire, MTJ and all transistors on discharge channel parasitic capacitance. In the simulation, according to CMOS and TST-MRAM processes technical documents, the  $R_{TR}$ ,  $R_{WR}$ and  $R_L$ , are 0.05 K $\Omega$ , 0.72 K $\Omega$  and 3.98 K $\Omega$ , respectively. As the capacitance in a memory array increases when the number of bit cell increases, we evaluate  $T_{dis}$  for different capacitances, as shown in Fig. 7.

For ease of the calculation, we use the following notations: total resistance and capacitance on R<sub>H</sub>BL discharge channel are  $R_{RH}$  and  $C_{RH}$ , respectively, and total resistance and capacitance on R<sub>L</sub>BL discharge channel are  $R_{RL}$  and  $C_{RL}$ , respectively, and total resistance and capacitance on RBL are  $R_R$  and  $C_R$ , respectively. The voltage difference ( $\Delta V$ ) between  $V_{RH}$  and  $V_S$  can be expressed as

$$\Delta V = V_{RH} - V_S = V_0 e^{-\frac{T_{dis}}{R_{RH}C_{RH}}} - V_0 e^{-\frac{T_{dis}}{R_RC_R}}$$
(4)

indicating that the magnitude of  $\Delta V$  is related to  $T_{dis}$ . On the basis of this assumption, the only factor influencing the discharge channel might come from the MTJs. Since the parasitic capacitance of MTJ is mainly affected by its feature size, we can assume that the parasitic capacitance keeps constant for the MTJs with the same feature size. The capacitance relationship of three discharge channels can be given as

$$C = C_{RH} = C_R = C_{RL} \tag{5}$$

For reading the datum '0', the MTJ connected to RBL is with low resistance. Considering the *TMR* ratio, that is  $TMR = (R_H - R_L)/R_L$ ,  $R_{RH}$  and  $R_R$  can be expressed as

$$R_{RH} = R_{TR} + R_{WR} + (1 + TMR)R_L$$
(6)

$$R_R = R_{TR} + R_{WR} + R_L \tag{7}$$

Therefore, Eq. (4) can be rewritten as

$$\Delta V = V_0 \left( e^{-\frac{T_{dis}}{C} \frac{1}{R_{TR} + R_{WR} + (1 + TMR)R_L}} - e^{-\frac{T_{dis}}{C} \frac{1}{R_{TR} + R_{WR} + R_L}} \right)$$
(8)

The dependence of  $\Delta V$  on  $T_{dis}$  can be deduced via its inverse function as follows.

$$T_{dis} = f(\Delta V) \tag{9}$$

As Eq. (9) is difficult to analytically solve from Eq. (8), we demonstrate the relationship between  $T_{dis}$  and  $\Delta V$  by utilizing the numerical solution. From the SMCR scheme read operation prospective, the  $T_{dis}$  is only meaningful before the change of  $\Delta V$  reaches its maximum (solid curves in Fig. 7). Note that the maximum  $\Delta V$  is almost constant for the same VDDR even with different parasitic capacitances, but the  $T_{dis}$ for achieving it heavily depends on the discharge channel parasitic capacitance. Furthermore, a larger  $V_0$  not only widens  $\Delta V$ , but also shortens the  $T_{dis}$  at which  $\Delta V$  reaches its maximum value. However, increasing  $V_0$  will enhance the read current, which could lead to unexpected magnetization switching. Hence,  $V_0$  should be properly assigned.

In the pre-charging phase, as the charging time  $(T_{charging})$  of bit line is determined by the parasitic capacitance, charging circuit and supply voltage, it can be regarded as a constant when these influencing factors are confirmed. In the proposed TST-MRAM structure, the simulation results show that  $T_{charging}$  is about 0.3 ns, which makes the bit lines have enough time to be charged to VDDR. In addition, at the SMCR-SA amplification phase, the voltage amplification principle behind the SMCR-SA is similar to voltage latch SA, thus the SMCR-SA amplification phase delay can be given by [33], [34]

$$T_{SA} = \frac{2C_L V_{THP}}{I_0} + \frac{C_L}{g_{m,eff}} \ln(\frac{1}{V_{THP}} \sqrt{\frac{I_0}{2\beta} \frac{\Delta V_{OUT}}{\Delta V}}) \quad (10)$$

where  $C_L$  is the load capacitance (i.e. 5 fF) of SMCR-SA output,  $g_{m,eff}$  is the effective transconductance of cross-coupled inverters,  $I_0$  is the current flowing through N6,  $\Delta V_{OUT}$  is the output of SMCR-SA and  $\beta$  relates to the transconductance parameter of N0, N1, N2 and N3. It is worth noting that there is a special case that N0, N1, N2 and N3 transistors are all turned off when  $V_{RH}$  voltage is reduced to  $V_{THN}$ , leading to malfunction for SMCR-SA. Hence, the bit line discharge time is limited to

$$T_{dis} \le -(R_{TR} + R_{WR} + (1 + TMR)R_L)C\ln\frac{V_{THN}}{V_0}$$
(11)

In summary, the total read operation delay  $T_{delay}$  of SMCR sensing scheme can be synthesized as

$$T_{delay} = T_{charging} + T_{dis} + T_{SA} = 300ps + f(\Delta V) + \frac{2C_L V_{THP}}{I_0} + \frac{C_L}{g_{m,eff}} \ln(\frac{1}{V_{THP}} \sqrt{\frac{I_0}{2\beta}} \frac{\Delta V_{OUT}}{\Delta V}) (T_{dis} \le -(R_{TR} + R_{WR} + (1 + TMR)R_L)C \ln\frac{V_{THN}}{V_0})$$
(12)

It can be seen from Eq. (12) that  $T_{delay}$  is mainly influenced by  $\Delta V$ . Fig. 8 demonstrates the relationship between  $T_{delay}$  and  $\Delta V$  with different VDDR, i.e., a larger  $\Delta V$  can be obtained by sacrificing the  $T_{delay}$ . Meanwhile, there are different delay times to get the same  $\Delta V$  at different VDDR.



Fig. 8. Dependence of  $T_{delay}$  on  $\Delta V$  with different VDDR.

For example, when  $\Delta V$  is set to 100 mV to overcome intrinsic offset voltage and the voltage swing of bit line, the  $T_{delay}$  is calculated to be 0.7 ns, 0.8 ns and 1.2 ns at the supply voltage of 1 V, 0.8 V and 0.6 V, respectively.

#### B. Reliability Analysis

The capability of self-matching a complementary reference to the sensed voltage can effectively improve the sensing reliability. Here, we analyse the reliability of SMCR sensing through the SM, which is the maximum  $\Delta V$  obtained under a constraint discharge time. According to the aforementioned delay analysis, a longer  $T_{dis}$  can produce a larger SM. There is thus a tradeoff relationship between the sensing delay and the reliability. However, even if the SM is determined by specifying the discharge time of the bit line, it is not a fixed value. There are fluctuations for SM due to the variation factors of bit-cell, SMCR and control circuit. The variation factors can be simulated by global Monte Carlo simulation which can span over different process corners [36], as illustrated in Fig. 9. It can be found that the voltage swing  $(V_{diff})$  of bit lines will lead to about 10 mV reduction for SM in the SMCR sensing scheme when SM sets to 100 mV. On the other hand, the offset voltage  $(V_{offset})$  of SA is also one of the major reasons behind SM degradation. In SMCR sensing scheme, the offset of SMCR-SA is restricted less than 15 mV at 0.8 V supply voltage. Despite the fact that the offset of SA can be further reduced by leveraging ancillary circuit and modifying the structure, it cannot be completely eliminated.

According to the above analyses, it is necessary to consider the  $V_{diff}$  and  $V_{offset}$  in the reliability discussion of SMCR sensing scheme. Hence, Eq. (2) is redefined as

$$SM = V_{RH} - V_{RL} - (V_{diff} + V_{offset})$$
(13)

where  $V_{offset}$ ,  $V_{diff}$ ,  $V_{RH}$ , and  $V_{RL}$  are random variables, with 0,  $\mu_{V_{offset}}$ ,  $\mu_{V_{RH}}$  and  $\mu_{V_{RL}}$  mean values, as well as  $\sigma_{offset}$ ,  $\sigma_{diff}$ ,  $\sigma_{V_{RH}}$  and  $\sigma_{V_{RL}}$  standard deviations, respectively [30]. The mean deviation and standard deviation of *SM* 



Fig. 9. Voltage statistical distributions of  $V_S$ ,  $V_{RH}$  and  $V_{RL}$  when the SM is set to 100 mV. (a) Read '0'. (b) Read '1'.

are then deduced by

 $\sigma$ 

$$\mu_{SM} = \mu_{V_{RH}} - \mu_{V_{RH}} - \mu_{V_{diff}}$$
(14)

$$S_{SM} = \sqrt{\sigma_{V_{RH}}^2 + \sigma_{V_{RL}}^2 + \sigma_{V_{diff}}^2 + \sigma_{V_{offset}}^2}$$
(15)

where the variations contributed by  $V_{diff}$ ,  $V_{offset}$ ,  $V_{RH}$  and  $V_{RL}$  are assumed uncorrelated.

Hence, the read BER is given by [25]

$$BER = \frac{1}{2} (1 + erf(-\frac{1}{\sqrt{2}} \frac{1}{\sigma_{SM}/\mu_{SM}}))$$
(16)

where erf(x) is Gauss error function. The change trend of *BER* relying on different configurations of *SM* is shown in Fig. 10. It can be observed that the *BER* can be obviously improved by increasing the mean of *SM*.

#### **IV. PERFORMANCE ANALYSES AND DISCUSSION**

To demonstrate the performance advantage of the proposed SMCR sensing scheme, including short latency and low *BER*, we compare it with conventional and recently-reported sensing structures. To this end, we embed the conventional single-ended voltage (CV) sensing and dynamic reference (DR) [30] sensing circuits into the 8Kb TST-MRAM array. This allows the above sensing schemes to be operated under the same peripheral circuits, which guarantees the comparison fairness. Hybrid CMOS/TST-MRAM simulations are conducted by applying 28 nm CMOS process technology and perpendicular-magnetic-anisotropy SOT-MTJ compact model [37]–[39].

Table I summarizes the fundamental parameters of TST-MRAM, which are dependent on physical models and



Fig. 10. *BER* of the SMCR sensing scheme with the different  $\mu_{SM}$ .

| Parameter                                               | Value             |  |
|---------------------------------------------------------|-------------------|--|
| MTJ area                                                | 25 nm x 25 nm x π |  |
| Oxide barrier height of MTJ                             | 0.85 nm           |  |
| free layer height of MTJ                                | 0.7 nm            |  |
| Length of heavy metal                                   | 120 nm            |  |
| Width of heavy metal                                    | 100 nm            |  |
| Thickness of heavy metal                                | 3 nm              |  |
| Resistivity of heavy metal                              | 200 µΩ·cm         |  |
| Temperature                                             | 300 K             |  |
| Nominal $R_{MTJ}$ at $R_L$ ( $R_H$ ) of MTJ             | 3.98 KΩ (8.75 KΩ) |  |
| Critical switching current $R_H - R_L(I_{c,H})$ of MTJ  | 45 μΑ             |  |
| Critical switching current $R_L - R_H (I_{c,L})$ of MTJ | 72 µA             |  |
| Critical density of heavy metal                         | 25 MA∙µm²         |  |
| TMR                                                     | 120%              |  |

TABLE I Key Parameters of TST-MRAM

experimental measurements [21], [37]. In Monte Carlo simulations for sensing circuits, the local and global variations are also taken into account. Meanwhile, assume that area and thickness following a Gaussian distribution indicate a 5% MTJ resistance variability in SOT-MTJ compact model, which is consistent with experimental data [38].

Fig. 11 shows the read access time of CV, DR and SMCR sensing schemes with different VDDR to obtain the same SM (e.g. 100 mV). The read access time of SMCR scheme is much shorter than the other two schemes. For example, with a voltage supply of 0.8 V, the read access time are 1 ns, 4.3 ns and 4.8 ns for SMCR, CV and DR schemes, respectively. Firstly, as the reference voltage of CV sensing scheme is typically generated by a single reference resistance (i.e.  $(R_H +$  $R_L$ )/2), it thus takes longer discharge time of bit line to form the proper voltage difference. Secondly, the reference voltage in DR sensing scheme must be generated by a pseudo-PMOS inverter circuit, which additionally increases the read access delay. Moreover, the voltage difference between sensed voltage and reference voltage in DR sensing scheme develops slowly owing to the dynamic regulation mechanism. By contrast, the reference voltages are directly inputted into SMCR-SA



Fig. 11. BER and read access time per bit with different VDDR.



Fig. 12. BER and read access time per bit with different TMR.

via bit lines  $R_HBL$ ,  $R_LBL$  and RBL in SMCR sensing scheme, allowing a compact read access process. Furthermore, Fig. 11 illustrates the *BER* performance of these three sensing schemes with different VDDR. The impact of *SM* on the *BER* of SMCR sensing scheme has also been analysed. We can find that SMCR sensing scheme outperforms its counterparts in term of *BER* when the mean value of *SM* is 98.63 mV and the supply voltage is 0.8 V. However, the advantage of SMCR scheme disappears at 0.6 V VDDR due to both the performance degradation of voltage SA and the reduction of *SM* at low voltage. When VDDR is increased up to 0.8 V or 1.0 V, the *BER* of SMCR sensing scheme significantly drops and will further decreases if the mean value of *SM* is raised from 98.63 mV to 145.5 mV.

In order to improve the *BER* of SMCR sensing scheme at a low VDDR (e.g. 0.6 V), an effective method is to enhance the *TMR* of MTJ. Fig. 12 exhibits the *BER* of the SMCR sensing scheme with different *TMR*. One can easily observe that the *BER* will be substantially reduced by increasing *TMR*. For example, *BER* can be reduced from  $1.84 \times 10^{-3}$  to  $9.26 \times 10^{-7}$ when *TMR* increases from 120% to 200% at 0.6 V VDDR. However, this improvement of *BER* will become weak and



Fig. 13. Read access energy per bit with different VDDR.

approach to an equilibrium when *TMR* is larger than 200%. This change trend is more pronounced at the VDDR of 0.8 V and 1 V due to the intrinsic variability contribution of the bit cell [30]. *BER* can reach  $1.02 \times 10^{-13}$  with 300% TMR and 0.8 V VDDR. Fig. 12 also demonstrates the influence of *TMR* on the read access time. A larger *TMR* enables a faster formation of voltage difference on the bit lines. To achieve a fixed *SM*, the read access time in the case of larger *TMR* will be smaller. For example, read access time is decreased from 1 ns to 0.7 ns when *TMR* is increased from 120% to 300% at 0.8 V VDDR.

In addition to the read operation advantages in terms of speed and reliability for SMCR sensing scheme, there is also a reduction in energy consumption. Fig. 13 gives the comparison of read access energy per bit among SMCR, CV and DR sensing schemes under different VDDR. As SMCR sensing scheme makes use of dual-reference bit lines to read data, which at first glance may mean double power consumption compared to CV and DR sensing schemes, but it still outperforms them in energy consumption. The main reason is that SMCR sensing scheme can realize the faster read operation and requires smaller read current. Compared with CV and DR schemes, the read access energy per bit of SMCR sensing scheme can provide 7.5% and 20.0% reductions at 0.8 V VDDR, respectively.

With regard to the area, Fig. 14 depicts the layout of a block of 8Kb TST-MRAM based on the SMCR sensing scheme, including  $128 \times 8$  bit cells, write and read circuits. The overall area is  $2873.78 \ \mu m^2$ . The layout area is minimized by sharing the source and drain between multiple transistors. For instance, the layout of four transistors adopts the common source and drain structure, as shown in Fig. 14(b). In this way, although the number of transistors is augmented in the SMCR sensing scheme, the area overhead of it is less than 1% of the whole area. This common source and drain structure can decrease the parasitic capacitances and resistances, which is beneficial for improving the discharge speed of bit line. Besides, it can also minimize the mismatch between the transistors. It is worthy to note that this structure degrades the signal transmission



Fig. 14. Layout of a block of 8Kb TST-MRAM with SMCR sensing scheme. (a) Overall structure. (b) Write circuit part. (c) TST-MRAM bit cell. (d) SMCR-SA part.

performance of transistor due to the deteriorated capacitive coupling effect. However, in the SMCR sensing scheme, the transistors applying common source or drain structure are only used to transmit the supply voltage and balance these voltages. Therefore, the deterioration of capacitive coupling will not affect the SMCR performance significantly.

At last, to elucidate the overall performance advantages of the proposed SMCR sensing scheme in TST-MRAM, we compare it with various previously-reported MRAM systems in Table II. The different MRAM technologies are all simulated in the same technology node and with the same size of array. At the same power supply, TST-MRAM with SMCR sensing scheme exhibits superior overall performance compared with other existing MRAM systems. Moreover, the performance of SRAM is also added in Table II. It is commonly known that non-volatility and cell area are two advantages of MRAM systems compared with SRAM. Meanwhile, due to the intrinsic limitation of nowadays MRAMs, both write and read operations are still slower than SRAM [41]. But SMCR sensing scheme is beneficial for narrowing the read latency gap between MRAM and SRAM. In addition, we can find that the BER of TST-MRAM with SMCR scheme is lower than that of SRAM. This is because that the bit cell of SRAM is composed entirely of transistors, which makes the

| TABLE II                                      |       |
|-----------------------------------------------|-------|
| PERFORMANCE COMPARISON OF DIFFERENT MEMORY SY | STEMS |

|                 | TST-MRAM<br>+SMCR                               | Single-port<br>SOT-MRAM [40] | STT-MRAM<br>+SCOC [29]                          | STT-MRAM<br>+CREV [28]                          | SRAM [41]             |
|-----------------|-------------------------------------------------|------------------------------|-------------------------------------------------|-------------------------------------------------|-----------------------|
| CMOS Technology | 28 nm                                           | 28 nm                        | 28 nm                                           | 28 nm                                           | 28 nm                 |
| Array Size      | 8 Kb                                            | 8 Kb                         | 8 Kb                                            | 8 Kb                                            | 8 Kb                  |
| Device Area     | $25 \text{ nm} \times 25 \text{ nm} \times \pi$ | 60 nm × 120 nm               | $20 \text{ nm} \times 20 \text{ nm} \times \pi$ | $20 \text{ nm} \times 20 \text{ nm} \times \pi$ |                       |
| TMR             | 120%                                            | 120%                         | 120%                                            | 120%                                            |                       |
| Cell Type       | 2T1TST                                          | 2T1SOT                       | 1T1MTJ                                          | 2T2MTJ                                          | 6T                    |
| Non-Volatility  | Yes                                             | Yes                          | Yes                                             | Yes                                             | No                    |
| Power Supply    | 0.8 V                                           | 0.8 V                        | 0.8 V                                           | 0.8 V                                           | 0.8 V                 |
| BER             | $6.38 \times 10^{-6}$                           | $4.72 \times 10^{-5}$        | $3.41 \times 10^{-5}$                           | $8.68 \times 10^{-5}$                           | $4.96 \times 10^{-4}$ |
| Read Latency    | 1 ns                                            | 1.4 ns                       | 1.85 ns                                         | 1.1 ns                                          | 0.5 ns                |
| Read Energy     | 0.172 pJ/bit                                    | 0.679 pJ/bit                 | 0.332 pJ/bit                                    | 0.245 pJ/bit                                    | 0.068 pJ/bit          |

read reliability of SRAM be significantly influenced by PVT variations [42], [43].

#### V. CONCLUSION

This paper presents an SMCR sensing scheme to provide high-speed and reliable read operation for TST-MRAM. This scheme can directly match a reference voltage from two candidate reference voltages to the sensed voltage, which reduces the read access time and forms a pair of complementary voltages to obtain a maximum SM. Theoretical analyses on delay and reliability of the proposed SMCR sensing scheme have firstly been implemented by considering the parasitic effect and manufacturing process mismatch. In order to confirm the advantageous performance, an 8Kb TST-MRAM array combining SMCR sensing scheme is designed and simulated. The mixed simulation results exhibit that our proposal can allow a read access time of 1 ns and a read *BER* of  $1.02 \times 10^{-13}$ . Moreover, compared with the previously published sensing schemes, SMCR sensing scheme can reduce the read access energy without significantly augmenting the area. In summary, this work offers a promising sensing solution for highperformance memories, and will be beneficial for building other emerging computing concepts.

#### REFERENCES

- R. De Rose *et al.*, "A variation-aware timing modeling approach for write operation in hybrid CMOS/STT-MTJ circuits," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 3, pp. 1086–1095, Mar. 2018.
- [2] G. Wang et al., "Ultra-dense ring-shaped racetrack memory cache design," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 1, pp. 215–225, Jan. 2019.
- [3] S. Wang, A. Pan, C. O. Chui, and P. Gupta, "Tunneling negative differential resistance-assisted STT-RAM for efficient read and write operations," *IEEE Trans. Electron Devices*, vol. 64, no. 1, pp. 121–129, Jan. 2017.
- [4] K. C. Chun, H. Zhao, J. D. Harms, T.-H. Kim, J.-P. Wang, and C. H. Kim, "A scaling roadmap and performance evaluation of inplane and perpendicular MTJ based STT-MRAMs for high-density cache memory," *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 598–610, Feb. 2013.
- [5] X. Fong, R. Venkatesan, D. Lee, A. Raghunathan, and K. Roy, "Embedding read-only memory in spin-transfer torque MRAM-based on-chip caches," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 3, pp. 992–1002, Mar. 2016.
- [6] E. Cheshmikhani, H. Farbeh, S. G. Miremadi, and H. Asadi, "TA-LRW: A replacement policy for error rate reduction in STT-MRAM caches," *IEEE Trans. Comput.*, vol. 68, no. 3, pp. 455–470, Mar. 2019.

- [7] M. Zabihi, Z. I. Chowdhury, Z. Zhao, U. R. Karpuzcu, J.-P. Wang, and S. S. Sapatnekar, "In-memory processing on the spintronic CRAM: From hardware design to application mapping," *IEEE Trans. Comput.*, vol. 68, no. 8, pp. 1159–1173, Aug. 2019.
- [8] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, "Computing in memory with spin-transfer torque magnetic RAM," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 26, no. 3, pp. 470–483, Mar. 2018.
- [9] G. Wang *et al.*, "Compact modeling of perpendicular-magneticanisotropy double-barrier magnetic tunnel junction with enhanced thermal stability recording structure," *IEEE Trans. Electron Devices*, vol. 66, no. 5, pp. 2431–2436, May 2019.
- [10] B. Rajendran and F. Alibart, "Neuromorphic computing based on emerging memory technologies," *IEEE J. Emerg. Sel. Topics Circuits* Syst., vol. 6, no. 2, pp. 198–211, Jun. 2016.
- [11] H. C. Yu *et al.*, "Cycling endurance optimization scheme for 1Mb STT-MRAM in 40nm technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2013, pp. 224–226.
- [12] J. J. Kan *et al.*, "A study on practically unlimited endurance of STT-MRAM," *IEEE Trans. Electron Devices*, vol. 64, no. 9, pp. 3539–3646, Sep. 2017.
- [13] Y.-D. Chih et al., "A 22nm 32Mb embedded STT-MRAM with 10ns read speed, 1M cycle write endurance, 10 years retention at 150°C and high immunity to magnetic field interference," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2020, pp. 222–224.
- [14] G. Yu et al., "Switching of perpendicular magnetization by spin-orbit torques in the absence of external magnetic fields," *Nature Nanotechnol.*, vol. 9, p. 548, May 2014.
- [15] Z. Zheng *et al.*, "Enhanced spin-orbit torque and multilevel currentinduced switching in W/Co-Tp/Pt heterostructure," *Phys. Rev. A, Gen. Phys.*, vol. 12, no. 4, Oct. 2019, Art. no. 044032.
- [16] Z. Zheng *et al.*, "Perpendicular magnetization switching by large spinorbit torques from sputtered Bi<sub>2</sub>Te<sub>3</sub>," *Chin. Phys. B*, vol. 29, no. 7, Jul. 2020, Art. no. 078505.
- [17] K. Garello *et al.*, "Manufacturable 300 mm platform solution for field-free switching SOT-MRAM," in *Proc. Symp. VLSI Circuits*, Kyoto, Japan, Jun. 2019, pp. 194–195.
- [18] Z.-Y. Luo, Y.-J. Tsou, Y.-C. Dong, C. Lu, and C. W. Liu, "Field-free spin-orbit torque switching of perpendicular magnetic tunnel junction utilizing voltage-controlled magnetic anisotropy pulse width optimization," in *Proc. Non-Volatile Memory Technol. Symp. (NVMTS)*, Sendai, Japan, Oct. 2018, pp. 1–5.
- [19] S. Z. Peng et al., "Field-free switching of perpendicular magnetization through voltage-gated spin-orbit torque," in *IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 2019, pp. 661–664.
- [20] Z. Wang *et al.*, "Proposal of toggle spin torques magnetic RAM for ultrafast computing," *IEEE Electron Device Lett.*, vol. 40, no. 5, pp. 726–729, May 2019.
- [21] M. Wang *et al.*, "Field-free switching of a perpendicular magnetic tunnel junction through the interplay of spin–orbit and spin-transfer torques," *Nature Electron.*, vol. 1, no. 11, pp. 582–588, Nov. 2018.
- [22] W. S. Zhao et al., "Failure and reliability analysis of STT-MRAM," Microelectron. Rel., vol. 52, nos. 9–10, pp. 1848–1852, Sep. 2012.

- [23] M. Cubukcu *et al.*, "Spin-orbit torque magnetization switching of a three-terminal perpendicular magnetic tunnel junction," *Appl. Phys. Lett.*, vol. 104, no. 4, Jan. 2014, Art. no. 042406.
- [24] S. Yuasa, T. Nagahama, A. Fukushima, Y. Suzuki, and K. Ando, "Giant room-temperature magnetoresistance in single-crystal Fe/MgO/Fe magnetic tunnel junctions," *Nature Mater.*, vol. 3, no. 12, pp. 868–871, Oct. 2004.
- [25] Q. K. Trinh, S. Ruocco, and M. Alioto, "Novel boosted-voltage sensing scheme for variation-resilient STT-MRAM read," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 10, pp. 1652–1660, Oct. 2016.
- [26] T. Na, B. Song, J. P. Kim, S. H. Kang, and S.-O. Jung, "Data-cell-variation-tolerant dual-mode sensing scheme for deep submicrometer STT-RAM," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 1, pp. 163–174, Jan. 2018.
- [27] Q.-K. Trinh, S. Ruocco, and M. Alioto, "Time-based sensing for reference-less and robust read in STT-MRAM memories," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 10, pp. 3338–3348, Oct. 2018.
- [28] T.-H. Yang, K.-X. Li, Y.-N. Chiang, W.-Y. Lin, H.-T. Lin, and M.-F. Chang, "A 28nm 32Kb embedded 2T2MTJ STT-MRAM macro with 1.3ns read-access time for fast and reliable read applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2018, pp. 482–483.
- [29] Q. Dong et al., "A 1-Mb 28-nm 1T1MTJ STT-MRAM with single-cap offset-cancelled sense amplifier and *in situ* self-write-termination," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 231–239, Jan. 2019.
- [30] Q.-K. Trinh, S. Ruocco, and M. Alioto, "Dynamic reference voltage sensing scheme for read margin improvement in STT-MRAMs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 4, pp. 1269–1278, Apr. 2018.
- [31] D. Gogl et al., "A 16-mb MRAM featuring bootstrapped write drivers," IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 902–908, Apr. 2005.
- [32] Y. Zhou *et al.*, "A self-timed voltage-mode sensing scheme with successive sensing and checking for STT-MRAM," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 5, pp. 1602–1614, May 2020.
- [33] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," *IEEE J. Solid-State Circuits*, vol. 39, no. 7, pp. 1148–1158, Jul. 2004.
- [34] S. Babayan-Mashhadi and R. Lotfi, "Analysis and design of a low-voltage low-power double-tail comparator," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 2, pp. 343–352, Feb. 2014.
- [35] Y. Zhou, H. Cai, B. Liu, W. Zhao, and J. Yang, "MTJ-LRB: Proposal of MTJ-based loop replica bitline as MRAM device-circuit interaction for PVT-robust sensing," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, early access, Mar. 12, 2020, doi: 10.1109/TCSII.2020.2980331.
- [36] T. H. Choi, H. Jeong, Y. Yang, J. Park, and S.-O. Jung, "SRAM operational mismatch corner model for efficient circuit design and yield analysis," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 8, pp. 2063–2072, Aug. 2017.
- [37] Z. Wang, W. Zhao, E. Deng, J.-O. Klein, and C. Chappert, "Perpendicular-anisotropy magnetic tunnel junction switched by spin-Hall-assisted spin-transfer torque," *J. Phys. D, Appl. Phys.*, vol. 48, no. 6, Jan. 2015, Art. no. 065001.
- [38] Y. Zhang et al., "Compact modeling of perpendicular-anisotropy CoFeB/MgO magnetic tunnel junctions," *IEEE Trans. Electron Devices*, vol. 59, no. 3, pp. 819–826, Mar. 2012.
- [39] Y. Zhang et al., "Compact model of subvolume MTJ and its design application at nanoscale technology nodes," *IEEE Trans. Electron Devices*, vol. 62, no. 6, pp. 2048–2055, Jun. 2015.
- [40] Y. Seo, K.-W. Kwon, X. Fong, and K. Roy, "High performance and energy-efficient on-chip cache using dual port (1R/1W) spin-orbit torque MRAM," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 6, no. 3, pp. 293–304, Sep. 2016.
- [41] Y. Yang, J. Park, S. C. Song, J. Wang, G. Yeap, and S.-O. Jung, "SRAM design for 22-nm ETSOI technology: Selective cell current boosting and asymmetric back-gate write-assist circuit," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 6, pp. 1538–1545, Jun. 2015.
- [42] J. Zhang, J. Wang, C. Peng, X. Li, Z. Lin, and X. Wu, "Self-compared bit-line pairs for eliminating effects of leakage current," *Electron. Lett.*, vol. 53, no. 21, pp. 1396–1398, Oct. 2017.
- [43] H. Jeong, Y. Yang, J. Lee, J. Kim, and S.-O. Jung, "One-sided static noise margin and Gaussian-tail-fitting method for SRAM," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 22, no. 6, pp. 1262–1269, Jun. 2014.



Jinkai Wang (Student Member, IEEE) received the B.S. degree in physics and electronic engineering from Kaili University, Kaili, China, in 2015, and the M.S. degree in circuits and systems from Anhui University, China, in 2018. He is currently pursuing the Ph.D. degree in physical electronics with Beihang University, China. His current research interest includes high performance hybrid circuits.



**Chenyu Lian** received the B.S. degree in software engineering from Beijing Jiaotong University, Beijing, China, in 2018. He is currently pursuing the M.S. degree in integrated circuit with Beihang University. His current research interests include efficient deep learning methods on hardware and memory computing.



Yining Bai received the B.S. degree in communication engineering from Beijing Jiaotong University, Beijing, China. She is currently pursuing the M.S. degree with Beihang University. Her current research interest includes memory computing.



**Guanda Wang** (Graduate Student Member, IEEE) received the B.S. degree in communication engineering from the Beijing University of Post and Telecommunication, Beijing, China. He is currently pursuing the Ph.D. degree with Beihang University. His current research interests include simulation analysis of MTJ and all spin logic devices.



Zhizhong Zhang (Student Member, IEEE) received the B.S. degree from Beihang University, Beijing, China, where he is currently pursuing the Ph.D. degree in microelectronics. His current research interests include the theoretical magnetism and micromagnetic simulation.



**Zhenyi Zheng** (Graduate Student Member, IEEE) received the B.S. and master's degrees from Beihang University, Beijing, China, in 2015 and 2018, respectively, where he is currently pursuing the Ph.D. degree. His current research interests include spin-orbit torque effect and ferrimagnetic materials.



Xiulong Wu (Member, IEEE) received the B.S. degree in computer science from USTC in 2001, and the M.S. and Ph.D. degrees in electronic engineering from Anhui University in 2005 and 2008, respectively. He is currently a Professor with Anhui University. From 2013 to 2014, he was a Visiting Scholar with the Engineering Department, The University of Texas at Dallas, USA. His

Youguang Zhang received the M.S. degree in mathematics from Peking University, Beijing, China, in 1987, and the Ph.D. degree in communication and electronic systems from Beihang University, Beijing, in 1990. He is currently a Professor with the School of Electronic and Information Engineering, Beihang University, Beijing. His research interests include circuit and system co-design for the emerging memory and computing systems.



Lei Chen received the B.S. degree in electronic and information engineering from Anhui University, Hefei, China, in 2018. He is currently pursuing the Ph.D. degree in microelectronics and solid-state electronics with Beihang University, Beijing, China. His research interests include lateral spin valves and emerging non-volatile memory technologies.



research interests include high performance SRAM and mixed-signal IC. **Sorin Cotofana** (Fellow, IEEE) received the M.Sc. degree in computer science from the Politehnica University of Bucharest, Romania, in 1984, and the Ph.D. degree in electrical engineering from the Delft University of Technology, Delft, The Netherlands. He is currently with the Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands. His current research





Kun Zhang (Member, IEEE) received the B.S. and Ph.D. degrees in physics from Shandong University, Jinan, China, in 2012 and 2017, respectively. He currently holds a post-doctoral position with Beihang University, Beijing, China. His current research interests include emerging non-volatile memory device and in-memory computing application.



Yue Zhang (Senior Member, IEEE) received the B.S. degree in optoelectronics from the Huazhong University of Science and Technology, Wuhan, China, in 2009, and the M.S. and Ph.D. degrees in microelectronics from the University of Paris-Sud, France, in 2011 and 2014, respectively. He is currently an Associate Professor with Beihang University, China. His current research interests include emerging non-volatile memory technologies and hybrid low-power circuit designs.

Ware en China, master His res emergi

Kelian Lin received the B.S. degree in software engineering from Beihang University, Beijing, China, in 2019, where he is currently pursuing the master's degree with the Microelectronics School. His research interests include lateral spin valves and emerging non-volatile memory technologies.

