# A Wideband Energy-Efficient Multi-Mode CMOS Digital Transmitter Beikmirza, Mohammadreza; Shen, Yiyu; de Vreede, Leo C.N.; Alavi, Morteza S. 10.1109/JSSC.2022.3222028 **Publication date** **Document Version** Final published version Published in IEEE Journal of Solid-State Circuits Citation (APA) Beikmirza, M., Shen, Y., de Vreede, L. C. N., & Alavi, M. S. (2023). A Wideband Energy-Efficient Multi-Mode CMOS Digital Transmitter. *IEEE Journal of Solid-State Circuits*, *58*(3), 677-690. https://doi.org/10.1109/JSSC.2022.3222028 # Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. # A Wideband Energy-Efficient Multi-Mode CMOS Digital Transmitter Mohammadreza Beikmirza<sup>®</sup>, *Graduate Student Member, IEEE*, Yiyu Shen<sup>®</sup>, *Member, IEEE*, Leo C. N. de Vreede<sup>®</sup>, *Senior Member, IEEE*, and Morteza S. Alavi<sup>®</sup>, *Member, IEEE* Abstract—This article presents a wideband, energy-efficient digital transmitter (DTX) suitable for multi-mode/multi-band wireless communication applications. It features various operation modes comprising Cartesian (Modes-1/-2) and multi-phase (Modes-3/-4) configurations utilizing LO clocks with different duty cycle in the interleaving/non-interleaving configurations. The multi-phase operation compromises polar and Cartesian features by mapping the I/Q signals into two non-orthogonal basis vectors with a 45° relative phase difference and a 3-bit phase selector scheme. The different operation modes are extensively analyzed and compared. Fabricated in a 40-nm CMOS process with an off-chip matching network, the proposed DTX occupies a core area of 0.72 mm<sup>2</sup> and delivers 23.18-dBm RF peak power at 2.1 GHz from a 0.95-V supply voltage with drain/system efficiencies of 66.26%/52.59%, respectively. Utilizing a simple memory-less digital pre-distortion (DPD) for a 160-MHz four-channel 64-quadrature amplitude modulation (QAM) orthogonal frequency-division multiplexing (OFDM) signal, the DTX delivers an average $P_{\text{Out}}$ of 13.5/11.4/7.7/9.4 dBm, achieving an adjacent channel leakage (power) ratio (ACL(P)R) of better than -42/-40/-38 dBc and an average error vector magnitude (EVM) of -36/-34/-32 dB, operating in Modes-1/-2/-3/-4, respectively. While transmitting a 200-MHz single-channel 256 (1024)-QAM OFDM signal at 2.4 GHz in Modes-1/-4, the average delivered output power is 14.11/9.29 (12.23/7.32) dBm with average drain and system efficiencies of 33.17%/26.3% (23.82%/22.83%) and 24.81%/22.85% (19.34%/18.81%), while the ACLR and EVM are better than -42/-41 (-43/-43) dBc and -34.6/-33.1 (-33.5/-33.9) dB, respectively. Index Terms—Balun, Cartesian, CMOS, digital power amplifier (DPA), digital pre-distortion (DPD), digital transmitter (DTX), efficient, load insensitive class-E, Marchand, multi-phase, radio frequency digital-to-analog converter (RF-DAC), reactance compensation, re-entrant, wideband. # I. INTRODUCTION N THE past two decades, wireless communication networks, such as cellular and wireless local area network (WLAN), have been considerably intertwined with our daily lives. To conform with the insatiable appetite for higher data Manuscript received 1 July 2022; revised 17 October 2022; accepted 9 November 2022. Date of publication 18 January 2023; date of current version 24 February 2023. This article was approved by Associate Editor Farhana Sheikh. This work was supported in part by Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO)/Ampleon Partnership Program under Project 16336 (DIPLOMAT). (Corresponding author: Mohammadreza Beikmirza.) The authors are with the Department of Microelectronics, Delft University of Technology, 2628 CD Delft, The Netherlands (e-mail: m.r.beikmirza@tudelft.nl). Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2022.3222028. Digital Object Identifier 10.1109/JSSC.2022.3222028 rates of new applications, the next-generation communication systems exploit spectrally efficient modulations and carrier aggregation, in which channels can be juxtaposed, leading to large modulation bandwidths [1]. Moreover, high RF bandwidth is also essential to realize ubiquitous RF transmitters/receivers (TXs/RXs). Hence, they must be frequency agile to cover multiple frequency bands. On the other hand, high energy efficiency is critical for increased battery life and improved user experience in multi-mode/multi-band wireless communication. To address these demands, digital TXs (DTXs) have emerged as favorable alternative architectures as they supplant the functionality of the conventional analog-intensive TXs circuit blocks, such as the baseband digital-to-analog converters (DACs), low-pass filter, mixer, and power amplifier (PA) with a single radio frequency DAC (RF-DAC) block [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. These DTXs consist of many unit cells comprising digital interpolation filters, bit-wise implicit mixers, and their digital PAs (DPAs). Consequently, contrary to their analogintensive counterparts, DTXs can fully benefit from nanoscale CMOS integration scalability to provide compact die areas with higher power efficiency due to the high-speed switching nature of core power devices, even in the face of reduced supply voltages. Moreover, these DTXs provide excellent reconfigurability and frequency agility, making them suitable for multi-mode/multi-band communication standards owing to their fully digital-intensive operation. Furthermore, it is also more straightforward to integrate these TXs as part of a system-on-a-chip (SoC) with non-RF fully digital circuits, such as modems and various application processors. The DTXs are primarily classified as a polar or Cartesian (quadrature/IQ) architecture, as illustrated in Fig. 1. The former is based on the polar coordinate signal consisting of the amplitude $(\rho)$ and phase $(\phi)$ information. In contrast, the latter operates based on the Cartesian coordinate system comprising the in-phase (I) and quadrature-phase (Q)information. In a polar DTX [2], [3], [4], [5], [6], [7], [8], [9], the two baseband eigen vectors of $\rho[n]$ and $\phi[n]$ are generated from the baseband I/Q signals using a COordinate Rotation DIgital Computer (CORDIC). A digital phase-locked loop (DPLL) produces a phase or frequency-modulated digital clock carrier employing the $\phi[n]$ information. The clock is fed to the RF-DAC unit cells that produce an RF output whose envelope is substantially proportional to the amplitude control word (ACW) or $\rho[n]$ by the ON/OFF switching of the RF-DAC unit cells. Fig. 1. (a) Polar and (b) Cartesian DTX architectures, and their operation concepts and corresponding efficiency contours' phase dependence comparison. Generally, the efficiency of the entire DTX chain strongly depends on its DPA efficiency. Since the phase component has a constant envelope, high-efficiency DPAs, for example, class-E, can be used in polar DTXs. They can operate efficiently since, as is clear from the drain efficiency (DE) diagram [see Fig. 1(a)], provided that the amplitude is constant, the achievable DE is constant. However, converting the I/Q data to polar form using a CORDIC is complicated, as described in [16], [17], and [18]. More importantly, the phase and amplitude paths must recombine at their final DPA cells without any delay mismatch to avoid spectral regrowth. Nonetheless, their Cartesian DTX [10], [11], [12], [13], [14], [15], [19] variants do not require a CORDIC, thus lowering the computing cost. The upconverted $I_{BB}/Q_{BB}$ digital samples drive their respected RF-DACs that produce two RF signal components whose amplitudes are ideally proportional to the respective I/Q digital inputs. Subsequently, the two amplitude-modulated RF components are synchronously combined to produce the desired composite RF output. To facilitate a more digitally intensive realization, the I/O combining can be also carried out in the digital domain while utilizing a common single power cell [10], [20]. Due to the linear summation of I/Q signal paths, Cartesian DTXs can manage large modulation bandwidth. Moreover, since the structure is symmetric, I/Q path synchronization is straightforward. Nevertheless, their DE [see Fig. 1(b)] is lower than their polar counterparts owing to the linear combination of orthogonal I/Q vectors, yielding a 3-dB worst case output power loss at the orthogonal axes. An alternative approach, called multi-phase DTX, has recently emerged to benefit from the highly efficient operation of polar DTXs and the wideband operation of Cartesian DTXs [21], [22], [23], [24], [25], [26]. This concept was originally proposed in [21] and was re-iterated in [22] and [23]. The first practical implementation of multi-phase operation DTX in the switched capacitor structure was proposed in [24] and [25]. Recently, a multi-functional DTX based on current-mode DPA capable of operating in both multi-phase and Cartesian Fig. 2. Concept of (a) un-signed and (b) signed DTXs, and their I/Q plane coverage comparison. modes has been realized to target applications requiring high spectral purity, low error vector magnitude (EVM), and large modulation bandwidth in a highly energy-efficient operation [26]. This article elaborates on its DTX architecture analysis, system-/circuit-level design considerations, and extensive measurement results, organized as follows. First, Section II introduces the proposed multi-mode DTX. Next, Section III unveils the DTX modes of operation. The detailed DTX architecture and circuit design and implementation together with its off-chip class-E matching network are provided in Section IV. Subsequently, Section V exhibits extensive measurement results of the prototype. Finally, in Section VI, we conclude this article. ## II. MULTI-MODE DIGITAL TRANSMITTER # A. Signed Versus Un-Signed DTX Operation The proposed multi-mode DTX is based on signed I/Q data. First, let us explore the difference between signed and un-signed DTX operations. The digital baseband I/Q data can be represented in signed or un-signed formats. The signed I/Qdata consist of two explicit sign bits (Sign<sub>I</sub> and Sign<sub>Q</sub>) to distinguish the positive and negative I/Q values. Nevertheless, their un-signed counterpart does not explicitly have sign bits, thus effectively covering only positive numbers. The N-bit signed data comprise one sign bit and N-1 bits representing the magnitude of the data, which is not the case in the N-bit un-singed scenario, as depicted in Fig. 2. Generally, a signed I/Q format is exploited in the digital baseband processing unit. However, the arrays of power cells in an RF-DAC can only process the magnitude part of its I/Q digital bitstreams. Therefore, the sign of I/O data must be engaged in the RF-DAC phase operation to adequately cover all four quadrants of the constellation diagram. To address this issue, the RF-DAC can adopt the following approaches. In the first approach [see Fig. 2(a)], similar to the conventional I/Q baseband DAC operation, the I/Q RF-DAC exploits the un-signed I/Q bitstreams [27], [28]. Subsequently, they are converted to un-signed representations employing digital level shifters and logical right shifts for proper signal scaling. Eventually, the I/Q un-signed data drive a pair of I/Q DACs. As illustrated in Fig. 2(a), in this context, the whole constellation diagram is shifted to the first quadrant by adding a dc offset to I and O data. It is worth mentioning that an un-signed version is similar to the technique used in a conventional current-steering DAC. Circuit-wise, the RF-DAC is often implemented based on arrays of double-balanced mixer unary cells. In this fully differential architecture, the ZERO state is achieved at mid-code, wherein $D_I = D_O =$ 2(N-1)-1, which is located at the equilibrium point of the differential pair at which RF<sub>OutP</sub> and RF<sub>OutN</sub> cancel each other, which is prone to mismatch. In this regard, as shown in Fig. 2(a), by dismissing three quadrants of the constellation diagram, the inherent swing of the system is halved. Consequently, the output current of each cell does not become zero as the function of input data code, but it can only change phase (0°, 90°, 180°, and 270°). Therefore, the related efficiency and linearity performance are similar to a class-A PA in which 100% of the input signal is used (conduction angle = $360^{\circ}$ ), and the active element remains conducting all the time. In the second approach [see Fig. 2(b)], the I/Q RF-DAC utilizes the signed I/Q bitstreams [29], [30], [31]. This approach contrasts sharply compared with the conventional DAC operation. In this context, due to exploring the phase information along with its magnitude, we can utilize the signed I/O data and directly apply them to the RF-DAC unit cells. The signed version requires a 1-bit phase modulator for each LO phase. In this respect, the sign information is retrieved by swapping the complementary quadrature clocks' phases using a multiplexer. Circuit-wise, the signed RF-DAC is often implemented based on arrays of single-balanced mixer unary cells. In this approach, to represent the ZERO state, all power cells can be turned off. Therefore, the key advantage of this configuration is that the ZERO output is well-defined and less prone to the mismatch between two differential paths, i.e., RF<sub>OutP</sub> and RF<sub>OutN</sub>. Accordingly, the swing is doubled in the signed I/Q approach, leading to higher output power. Moreover, the signed I/Q operation replicates an efficiency behavior similar to class-B, contrasting with the un-signed I/Qoperation. Therefore, the signed I/Q operation is preferable to achieve higher output power and overall system efficiency. # B. Signed Cartesian Operation Principle The concept of the signed Cartesian phase selector (sign bit) operation is demonstrated in Fig. 3. The phase selector typically operates directly on the quadrature clock signals ( $f_{\rm LO,0}$ , $f_{\rm LO,90}$ , $f_{\rm LO,180}$ , and $f_{\rm LO,270}$ ). As illustrated, depending on the four states of the I/Q sign bits, the related complementary clock pairs can be swapped. These phase-modulated LO clocks are then directly mixed with the up-sampled baseband data ( $I_{\rm BB}$ , $Q_{\rm BB}$ ) to cover the targeted constellation quadrants. For example, in the case of transition from the first quadrant (Sign $_Q=0$ , Sign $_I=0$ ) to the second one (Sign $_Q=0$ , Fig. 3. (a) Conventional Cartesian DTX phase selector (sign bit) operation. (b) Cartesian DTX four-quadrant and (c) its corresponding related swapped complementary clock pairs. Sign<sub>I</sub> = 1), since the sign of I is changed, the corresponding complementary clocks, $CLK_{IP}$ and $CLK_{IN}$ , are swapped [see table in Fig. 3(c)]. In this system, the rectangular-shaped RF LO clocks can be written using the Fourier series $$\Pi(t, T_0, d, \theta) = \sum_{n = -\infty}^{n = \infty} \frac{\sin\left(\frac{n\pi d}{T_0}\right)}{n\pi} e^{\frac{-j2n\pi}{T_0}\left(t - \frac{\theta}{2\pi} + \frac{d}{2}\right)}$$ (1) where $T_0$ is upconverting clock period, d is the duty cycle, and $\theta$ represents the relative phase. The differential RF waveform is synthesized using the following equation: $$RF_{Out} = |I(t)| \times [\Pi(t, T_0, d, \theta_I) - \Pi(t, T_0, d, \theta_I + \pi)] + |Q(t)| \times [\Pi(t, T_0, d, \theta_Q) - \Pi(t, T_0, d, \theta_Q + \pi)]$$ (2) where $\theta_{\rm I}-\theta_Q=\pm(\pi/2)$ depending on the quadrant. Therefore, the resulting waveform is created based on two orthogonal signals. The vectors related to |I(t)| and |Q(t)| provide the (positive or negative) amplitude of in-phase and quadrature signals. However, the negative values of I(t) and Q(t) are implemented by a 180° phase shift in the RF carrier clocks. Consequently, the desired I(t) and Q(t) vectors are created by recombining amplitude-modulated |I(t)| and |Q(t)| signal and the corresponding 1-bit resolution, two-state (bang-bang) phase-modulated I/Q LO clocks based on the sign of the I/Q baseband signals. # C. Signed Multi-Phase Operation Principle The signed Cartesian operation employs two orthogonal I(t) and Q(t) vectors. However, this technique can be extended to a more generalized multi-phase operation by mapping I/Q baseband signals into two non-orthogonal vectors [21], [22], [23], [25]. In light of this, the phase modulation resolution of the corresponding up-converting clock signals is increased to four or more phases (2-bit or more). For example, Fig. 4. Multi-phase (eight-phase) operation baseband data mapping depending on the corresponding octant. Fig. 5. (a) Multi-phase (eight-phase) DTX phase selector (sign bit) operation utilizing $\operatorname{Sign}_Q$ , $\operatorname{Sign}_Q$ , and the sector bit (SB). (c) Associated swapped complementary clock pairs based on the operating octant. as presented in Fig. 4, an eight-phase operation can be utilized. This architecture compromises polar and Cartesian features by mapping the I/Q signals into two non-orthogonal basis vectors with 45° relative phase difference and the magnitudes based on the corresponding octant defined as $$\begin{cases} I + jQ = I_{MP} + Q_{MP} \left(\frac{1+j}{\sqrt{2}}\right) |I| \ge |Q| \\ I + jQ = jI_{MP} + Q_{MP} \left(\frac{1+j}{\sqrt{2}}\right) |I| < |Q|. \end{cases}$$ (3) Besides, as illustrated in Fig. 5, a phase selector is exploited to modulate the LO clocks properly. As mentioned previously, the phase selector's controlling signals in Cartesian modes are I/Q sign bits. The multi-phase operation modes require 3-bit control signals to cover 8-octant. Two of which are $\operatorname{Sign}_I/\operatorname{Sign}_Q$ bits, and the third bit is generated in the digital domain based on the multi-phase mapping operation. Based on these three controlling bits, the associated LO clock pairs can be swapped, and thus, the entire eight octants of the constellation diagram are covered. Fig. 6. (a) Multi-phase DTX operation concept. (b) Its corresponding efficiency contours reduced phase dependence compared to the Cartesian counterpart. The RF waveform again consists of the summation of two vectors $$\begin{aligned} & \text{RF}_{\text{Out}} \\ &= |I_{\text{MP}}(t)| \times \left[ \Pi \left( t, T_0, d, \theta_{\text{I}_{\text{MP}}} \right) - \Pi \left( t, T_0, d, \theta_{I_{\text{MP}}} + \pi \right) \right] \\ &+ |Q_{\text{MP}}(t)| \times \left[ \Pi \left( t, T_0, d, \theta_{Q_{\text{MP}}} \right) - \Pi \left( t, T_0, d, \theta_{Q_{\text{MP}}} + \pi \right) \right] \end{aligned} \tag{4}$$ where $\theta_{I_{MP}} - \theta_{Q_{MP}} = \pm (\pi/4)$ depending on the octant. As illustrated in Fig. 6, this architecture inherits the advantages of the Cartesian DTX, such as wideband operation, and symmetrical and synchronized I/Q paths with a DE behavior that imitates the polar case. As presented in Fig. 6, the DTX's efficiency enhances by employing two vectors that have a reduced phase difference. The normalized efficiency per trajectory points of a 256-quadrature amplitude modulation (QAM) orthogonal frequency-division multiplexing (OFDM) signal applied to a polar, a signed Cartesian, and an eight-phase operation DTX and their corresponding normalized efficiency versus normalized output power are depicted in Fig. 7. In the polar DTX case, thanks to phase-independent efficiency for a given amplitude [see Fig. 7(a)], its normalized efficiency versus normalized output power is confined, resulting in relatively higher average DE. Repeating the same experiment for the signed Cartesian scenario, the efficiency plot [see Fig. 7(b)] is scattered due to its phase-dependent behavior. This feature results in a lower average DE. In comparison, due to the relatively phase-independent efficiency behavior of the eight-phase operation [see Fig. 7(c)], this plot is more confined than the Cartesian structure, slightly resembling the polar case, resulting in a somewhat higher average DE. Theoretically, this principle can be extended to more phases [24], [25]. Nevertheless, this would burden the multi-phase clock generator and phase selector, which increases the complexity while operating at a higher speed, deteriorating the achievable system efficiency of practical implementation. The implication of the synchronous operation of $\rho$ and the phase $\phi$ in a polar DTX, as well as the sign bits and unsigned data in Cartesian and multi-phase operation DTXs, are illustrated in Figs. 8–10, respectively. Simulations show the output spectrum (100 MHz), along with the EVM and ACPR, for different signal bandwidths (20, 50, 100, and 200 MHz) versus various timing mismatches. Accordingly, the EVM Fig. 7. Normalized dynamic efficiency contour of a 256-QAM OFDM signal and its associated normalized efficiency versus normalized output power for (a) polar, (b) Cartesian, and (c) eight-phase operation DTXs. Fig. 8. (a) Polar DTX: output spectrum of a 100-MHz signal, and (b) and (c) EVM and ACPR of a 20-/50-/100-/200-MHz OFDM 64-QAM signal versus amplitude and phase data timing mismatch. and ACPR degrade by increasing the timing mismatch. This synchronous recombination requirement is more stringent for a polar DTX than a Cartesian or multi-phase architecture, as its EVM and ACPR degrade more for the same timing mismatch and bandwidth. Moreover, consistently, this issue exacerbates by increasing the signal bandwidth. # III. DIGITAL TRANSMITTER MODES OF OPERATION The various operating modes of the proposed DTX are illustrated in Fig. 11. It is categorized into interleaving [11], Fig. 9. (a) Signed Cartesian DTX: output spectrum of a 100-MHz signal, and (b) and (c) EVM and ACPR of a 20-/50-/100-/200-MHz OFDM 64-QAM signal versus sign bits and un-signed data timing mismatch. Fig. 10. (a) Signed eight-phase DTX: Output spectrum of a 100-MHz signal, and (b) and (c) EVM and ACPR of a 20-/50-/100-/200-MHz OFDM 64-QAM signal versus sign bits and un-signed data timing mismatch. [20], [27] and non-interleaving modes [29], [32], featuring Cartesian or multi-phase operation utilizing LO clocks with different duty cycle. The multi-phase mode exploits the I/Qto multi-phase mapping. The related LO clocks with the 45° phase difference are selected based on the octant where the I/Qbaseband point is located (see Fig. 5). In a non-interleaving mode [see Fig. 11(d)], which is conventionally utilized in I/Q RF-DACs, the I and Q paths are combined in the RF output node, exhibiting an inferior image-rejection ratio. In the interleaving modes [see Fig. 11(a)], for each DPA, the upconverted baseband signals are digitally combined while sharing a single power cell. Interleaving Cartesian with 25%-LO duty cycle [see Fig. 11(b)] and interleaving multi-phase operation with 12.5%-LO duty cycle [see Fig. 11(c)] are named Mode-1 and Mode-3, respectively. On the other hand, in the noninterleaving modes, one of the DPAs is allocated to only I $(I_{\rm MP})$ , and the other DPA is assigned to merely $Q_{\rm MP}$ ; thus, their outputs are combined at the output drain nodes of the power cell. The non-interleaving Cartesian with 25%-LO Fig. 11. Overall operating modes of the proposed DTX. (a) Interleaving and (d) non-interleaving configurations. In (b) Mode-1 and (e) Mode-2 cases, the I/Q vectors are normalized to $(D/\pi\sqrt{2})$ . In (c) Mode-3 and (f) Mode-4 scenarios, the $I_{MP}/Q_{MP}$ vectors are normalized to $(D\sin((\pi/8))/\pi)$ . Fig. 12. Relative phase shift of the rectangular-shaped RF LO associated with (a) Cartesian and (b) eight-phase operation phase selector. duty cycle [see Fig. 11(e)] is labeled as Mode-2, while Mode-4 represents non-interleaving multi-phase operation with 25%-LO duty cycle [see Fig. 11(f)]. These different operation modes are mathematically analyzed and compared in the following. For simplicity, the differential $RF_{Out}(t)$ in (2) and (4) can be redefined in a single-ended description as $$RF_{Out}(t) = |A(t)| \times \Pi(t, T_0, d, \theta_A) + |B(t)| \times \Pi(t, T_0, d, \theta_B)$$ (5) where |A(t)| and |B(t)| generally represent |I(t)| and |Q(t)| in Cartesian modes of operation or $|I_{MP}(t)|$ and $|Q_{MP}(t)|$ in multi-phase operation modes, respectively. Fig. 12 illustrates the different phases, which can be selected based on $\theta_A$ and $\theta_B$ . Accordingly, depending on the corresponding quadrant in Cartesian [see Fig. 12(a)], $\theta_A$ ( $\theta_B$ ) can be 0 or $\pi$ (( $\pi/2$ ) or (3 $\pi/2$ )), requiring an LO clock signal or its complementary version. Accordingly, shifting LO clocks by, e.g., half an RF cycle, corresponds to a phase shift of 180° in the fundamental component of (1). However, in the multi-phase operation scenarios, when the RF<sub>Out</sub>(t) shifts from one octant to another, $\theta_A$ and $\theta_B$ are multiplexed between 0, ( $\pi/2$ ), $\pi$ , (3 $\pi/2$ ) and ( $\pi/4$ ), (3 $\pi/4$ ), (5 $\pi/4$ ), (7 $\pi/4$ ), respectively [see Fig. 12(b)]. Utilizing (1), the Fourier series of RF<sub>Out</sub>(t) can be expanded, and its fundamental component can be obtained by assuming n=1 as $$|\operatorname{RF}_{\operatorname{Out}}(t)|_{n=1} = \frac{\sin\left(\frac{\pi d}{T_0}\right)}{\pi} \left(|A(t)|e^{+j\theta_A} + |B(t)|e^{+j\theta_B}\right) e^{\frac{-j2\pi}{T_0}\left(t + \frac{d}{2}\right)}.$$ (6) With simplifying assumption of |A(t)| = |B(t)| = D, the magnitude of the fundamental component, $|RF_{Out}(t)|_{n=1}$ , is calculated as follows: $$|\text{RF}_{\text{Out}}(t)|_{n=1} = \frac{\text{Dsin}\left(\frac{\pi d}{T_0}\right)}{\pi} \left| e^{+j\theta_A} + e^{+j\theta_B} \right| \tag{7}$$ where, in Modes-1/-2 and Modes-3/-4, the maximum of D is 1 and $1/(2)^{1/2}$ , respectively. In the remainder of this section, the DTX's different modes of operation are explained by different clock duty cycle (d) and assumptions of operation in the first quadrant or octant of I/Q plane for the Cartesian or multi-phase. Fig. 13. (a) Detailed block diagram of the implemented multi-mode CMOS chip. (b) Chip microphotograph. A. (Non)-Interleaving Cartesian With 25%-LO Duty Cycle (Mode-1/Mode-2) According to Fig. 12(a), in Cartesian cases, $\theta_A$ and $\theta_B$ are equal to 0 and $(\pi/2)$ , respectively, while d corresponds to $(T_0/4)$ for a 25%-LO duty cycle clock. Therefore, for Mode-1, (7) yields $$|\text{RF}_{\text{Out}}(t)|_{n=1}^{\text{Mode}-1} = \frac{\text{Dsin}(\frac{\pi}{4})}{\pi} \left| 1 + e^{\frac{+j\pi}{2}} \right| = \frac{D}{\pi}.$$ (8) In the Mode-2 scenario, since each DPAs is allocated to only I or Q, the number of allocated power cells is halved compared to Mode-1, resulting in $|RF_{Out}(t)|_{n=1}^{Mode-2} = (D/2\pi)$ . B. Interleaving Multi-Phase Operation With 12.5%-LO Duty Cycle (Mode-3) In a multi-phase operation case with a 12.5%-LO duty cycle in the first octant, d, $\theta_A$ , and $\theta_B$ correspond to $(T_0/8)$ , 0, and $(\pi/4)$ , respectively. Inserting these values in (7), after some algebraic abstraction, results in $$|\text{RF}_{\text{Out}}(t)|_{n=1}^{\text{Mode}-3} = \frac{\text{Dsin}(\frac{\pi}{8})}{\pi} \left| 1 + e^{\frac{+j\pi}{4}} \right| = \frac{D}{\pi \sqrt{2}}.$$ (9) C. Non-Interleaving Multi-Phase Operation With 25%-LO Duty Cycle (Mode-4) In this case, $\theta_A$ and $\theta_B$ are the same as Mode-3, while d is equal to $(T_0/4)$ due to 25%-LO duty cycle. Thus, (7) can be expressed by $$|\text{RF}_{\text{Out}}(t)|_{n=1}^{\text{Mode}-4} = \frac{\text{Dsin}\left(\frac{\pi}{4}\right)}{\pi} \left| 1 + e^{\frac{+j\pi}{4}} \right| = \frac{D}{2\pi} \sqrt{\frac{2+\sqrt{2}}{2}}.$$ (10) It should be noted that, as mentioned in Section III-A, the (1/2) factor is due to operating in a halved number of power cells in the non-interleaving operation. ### IV. IMPLEMENTATION DETAILS The DTX architecture features two separate parts, namely, a multi-mode digital CMOS chip and an off-chip matching network. In the remainder of this section, various parts of our proposed DTX will be presented. # A. Multi-Mode Digital TX CMOS Chip The implemented chip overall system block diagram is depicted in Fig. 13(a). It is subcategorized into the digital baseband signal processing block, the LO and sampling clock generation block, the delay alignment and phase (sign bit) selector block, and the RF-DACs. 1) Clock Generation and Distribution: At the DTX input, an on-chip transformer converts an off-chip unbalanced local oscillator clock running at $4 \times f_C$ to its balanced counterparts. The transformer's outer diameter is 150 $\mu$ m $\times$ 150 $\mu$ m with a 1:1 turns ratio, while the center tap is located at its secondary winding and connected to a common-mode voltage of $V_{DD}/2$ . Although a recursive design is performed to achieve a transformer with negligible amplitude and phase mismatch to prevent any misalignment, a phase aligner comprising a back-to-back inverter pair is employed at the transformer output. These phased aligned differential high-speed clocks, i.e., $4 \times f_{C,0}$ and $4 \times f_{C,180}$ , are applied to a divide-by-2 circuit to generate four complementary quadrature clocks at $2 \times f_C$ with 50% duty cycle and a relative phase difference in multiples of 90° [2 × $f_{LO,0.50\%}$ , 2 × $f_{LO,90.50\%}$ , 2 × $f_{LO,180.50\%}$ , and 2 $\times$ $f_{LO,270}$ 50% in Fig. 13(a)]. The topology of this divider is shown in Fig. 14(a), which is implemented as a flip-flop-based frequency divider that consists of four $C^2MOS$ latches arranged in a loop [32]. The back-to-back connection of Q and $\bar{Q}$ in the latches aligns the four differential clock phases and improves the quadrature-phase accuracy. It is worth noting that employing wider transistors can enhance the speed of $C^2$ MOS latches as the supply voltage is fixed and set to Fig. 14. (a) Four-phase and (b) eight-phase divide-by-2 circuits with corresponding waveforms. (c) $C^2$ MOS latch with swapped data/clock inputs. $V_{DD} = 1.1$ . Nonetheless, as depicted in Fig. 14(c), to boost the operating frequency of the divider, the data and clock inputs of $C^2$ MOS latches are also swapped compared with a conventional $C^2$ MOS latch to decrease the input-to-output delay of the latch and, consequently, the overall loop time excursion of the divider [32]. These quadrature clocks are applied to a following divide-by-2 circuit to generate the desired carrier frequency at $f_C$ with 50%-LO clocks and 45° phase differences (i.e., $f_{LO,k_{50\%}}$ , k = 0, 45, ..., 315). The architecture of the eight-phase-generator divide-by-2 is depicted in Fig. 14(b) implemented by extending the concept of the previous four-phase-generator divide-by-2 circuit. Multiples of 45° clock phases are generated by a ring divider comprising eight $C^2$ MOS latches arranged in a loop. As depicted in Fig. 14(b), the four quadrature clocks generated by the previous divide-by-2 stage are employed as the clocks of the $C^2$ MOS latches. The structure of the $C^2$ MOS latches is as before, while its transistor sizing is adjusted based on the desired operation frequency. Depending on the intended operating mode, an arrangement, whereby the 50%-LO clocks at $f_C$ have 12.5%-LO or 25%-LO overlaps, is selected. As illustrated in Fig. 15, the duty cycle generation circuit produces the related eight-phase 12.5%-LO or 25%-LO clocks by bit-wise AND operation of the corresponding LO pairs. It is also worth mentioning that the AND gates are implemented in a symmetrical configuration to equalize the delays of the clocks [33]. An independent master/baseband clock can be generated using another off-chip single-ended clock running at $2 \times f_C$ . After an unbalanced-to-balanced conversion using an active unbalanced-to-balanced converter and a subsequent divide-by-2 circuit, the $F_S$ clock is generated. This master clock is then applied to a divide-by-4 circuit to generate the $F_{\rm BB}$ clock. To mitigate the crosstalk mainly caused by capacitive coupling, ground lines were placed in the middle of the quadrature LO lines. Besides, shielding is utilized to suppress the LO leakage and diminish the coupling from other routing lines, e.g., data routing, when multiple crossover lines occur. Fig. 15. Related eight-phase (a) 12.5%-LO or (b) 25%-LO duty cycle clocks generation by bit-wise AND operation of the corresponding LO pairs. Fig. 16. (a) Schematic of the 4-bit fine-resolution delay line and (b) its delay cells. (c) Delayed 3-GHz clock simulation. 2) Delay Alignment and Phase Selector: To compensate for design variations, such as the process/voltage/temperature (PVT), frequency, and load variations on the phase relations, fine-tune phase aligners are adopted [18] and implemented, as shown in Fig. 16. The controls for these phase aligners are static and come from a serial-to-parallel interface (SPI). The absolute delay of each delay cell is controlled with a single bit by enabling or disabling NMOS and PMOS transistors in series with the supply/ground paths. The RF clock passes through 15 cascaded delay cells to arrive at the output, resulting in a total relative delay of 75 ps with a resolution of roughly 5 ps [see Fig. 16(c)]. The delay alignment block is followed by a phase selector circuitry implemented by complementary NAND-gate-based multiplexers with input selection control signals to modulate the LO clock properly, as demonstrated in Fig. 17. In a Cartesian mode, these controlling signals are I/Q sign bits $[I_{BB,UP}[11] (Sign_I)$ and $Q_{BB,UP}[11] (Sign_Q)]$ . The multi-phase operation modes demand 3-bit selection control signals to cover 8-octant, as discussed earlier. Another controlling signal is employed in the phase selector controlling circuit to select between Cartesian and multi-phase Cartesian modes. Fig. 17. (a) Schematic of the clock phase selector. (b) and (c) NAND gate-based multiplexer implementation circuitry. (c)–(e) 4-/2-input symmetrical NAND logic gate. To equalize the delays of the clocks, the NAND gates are implemented in a fully symmetrical configuration. Moreover, a back-to-back inverter pair is employed for further phase alignment of the complementary clock pairs. 3) Un-Signed 11-Bit I/Q DPA Floor Plan: As a tradeoff between the in-band linearity, far-out noise, power consumption, and the overall complexity of the DTX, a resolution of 12-bit per I/Q (including sign bit) is selected for the DPA to meet the requirement of wireless communication standards. The quantization-noise-limited dynamic range (DR<sub>QN</sub>) of the DPA, in terms of its resolution (N), the signal peak-to-average power ratio (PAPR), and the oversampling ratio (OR), defined as $F_S/(2BW)$ , is given by $$DR_{QN}(dBc) = 6.02N + 1.76 + 10 \log_{10} (OR) + 3.01 - PAPR$$ (11) where the 3-dB factor arises from the I and Q operations. This formula predicts a dynamic range of 68.9 dBc, assuming a target BW and PAPR of 200 MHz and 8.2 dB, respectively, and $F_S$ of 600 MHz. This is sufficient to fulfill the TX requirements of the current and next-generation wireless mobile networks. Moreover, the DTX comprises two identical 12-bit resolution I/Q banks. Fig. 18(a) depicts the implementation details and the floorplan of one of the I/Q banks. For each bank, the digital baseband data $(I/Q_{BB}[11:0])$ are stored on a four parallel 1k SRAMs, clocked at $F_{BB}$ , which are programmed through the low-speed SPI interface. The 12-bit digital I/Q baseband signals pass through multiplexers to up-sample by 4 ( $F_S = 4 \times F_{BB}$ ). $I/Q_{BB,UP}[10:0]$ represents the up-sampled un-signed binary digital codes segmented to 4-bit $(I/Q_{BB,UP}[3:0])$ binary-weighted LSB and 7-bit $(I/Q_{BB,UP}[10:4])$ thermometer-coded MSB cells. Moreover, the 7-bit MSB are split into 3-bit $(I/Q_{BB,UP}[10:8])$ for the column encoder and 4-bit $(I/Q_{BB,UP}[7:4])$ for the row encoder. Hence, the 128 MSB units of each part are distributed over 16 rows ( $I/Q_{BB,UP^R}[15:0]$ ) and eight columns ( $I/Q_{BB,UP^C}[7:0]$ ). Furthermore, each part's LSB units comprise 16 small unit cells $(I/Q_{BB,UP^B}[15:0])$ that occupy a column. Instead of having two separate push-pull banks, every other column of the I/Q RF-DAC matrix is dedicated to the in-phase arrays and their 180° out-of-phase counterparts [33]. This Fig. 18. (a) Un-signed 11-bit I/Q DPA floor plan. (b) I/Q RF-DAC sub-cells. technique reduces the overall *I/Q* RF-DAC core size for the same achievable output power, resulting in a highly compact area, smaller parasitics, minimal mismatch, and less power consumption, resulting in improved overall DTX efficiency. Swapped/cross-coupled power lines for the in-phase and out-of-phase drain lines are utilized to equalize the parasitics in the primary output traces. The I/Q RF-DAC sub-cell comprises the pure digital logic section and the power-cell part [see Fig. 18(b)]. The logic part consists of a decoding logic and a time synchronizer flip-flop followed by an I/Q mixing circuit. Depending on the operating mode, a multiplexer enables the appropriate clock pairs to fulfill the up-conversion. Furthermore, the synchronized digital data are upconverted by corresponding LO clocks using bit-wise NAND operation. Eventually, the upconverted I/Q bitstreams are combined by the subsequent NAND gate to fulfill I/Q interleaving and feed the following power cell inverter buffers. All digital circuits are implemented based on symmetrical gates to equalize the delay from the input to the output and the fan-out for proceeding circuitry. This chip is combined with an off-chip class-E matching network. Meanwhile, since, in class-E, the peak drain voltage is 3.66 times the supply voltage, a cascode topology is adopted in this design to prevent reliability violations. Fig. 13(b) exhibits the chip micrograph realized in the 40-nm bulk CMOS process, while the corresponding block names are listed in the table. The chip occupies a die area of $2.23 \times 0.96 \text{ mm}^2$ with a core area of $0.72 \text{ mm}^2$ . Moreover, a dedicated SPI and the designated SRAMs are digitally synthesized and occupy an area of $2 \times 0.43 \times 0.43 \text{ mm}^2$ , while decoupling capacitors and I/O pads occupy the remainder. # B. Wideband Marchand Balun and Matching Network The simplified parallel-circuit class-E structure [34], [35], [36] is plotted in Fig. 19(a). To have a wideband RF Fig. 19. (a) General push-pull class-E schematic. (b) Compensated Marchand balun with re-entrant coupled lines and ac grounding at $\lambda/8$ for the second-harmonic control using a via from the floating metal layer to the ground plane. Electromagnetic simulation results of (c) and (d) fundamental and second-harmonic impedances for the load network and (e) and (f) amplitude/phase imbalance and transmission losses over the frequency. (g) Final realization of the re-entrant type Marchand balun and the side view of the wirebonding structure of the chip to the matching network. operation, the load angle seen by the intrinsic drain should remain constant over the designated RF bandwidth. This feature accomplishes through reactance compensation [35]. The reactance of the series $(L_0, C_0)$ and shunt resonant $(L_D, C_D)$ circuits vary with frequencies, exhibiting an increase in the case of a series circuit and a decrease in the case of a loaded parallel circuit near the resonant frequency. A constant load angle seen by the intrinsic drain over a large RF bandwidth is accomplished with a proper choice of these circuit elements. The push-pull class-E DPA is connected to a Marchand coupled transmission-line transformer [37], [38], [39] to form the wideband balanced-to-unbalanced operation [see Fig. 19(b)]. Tight differential coupling with a high even-mode impedance is required to realize the wideband class-E load network with sufficiently low impedance. This feature is realized by employing re-entrant type coupled lines with a proper dielectric constant and dielectric layer thickness between and underneath the conductors, yielding a low-loss wideband balun. A wellcontrolled wideband second-harmonic termination for class-E operation can be achieved by utilizing the orthogonality between the fundamental (differential) signal and the inphase (common-mode) behavior of the second-harmonic signals. Consequently, the required open second harmonic for a digital class-E DPA can be realized by providing an evenmode short-circuited condition at $\lambda/8$ distance of the DPAs, something that can be practically achieved by placing a simple via to ground in the center of the floating center plate conductor. Electromagnetic simulation results of the fundamental and second-harmonic impedances for the load network are plotted in Fig. 19(c) and (d), providing large second-harmonic impedance and balanced amplitude and phase for the differential to single-ended conversion. Fig. 19(e) and (f) shows the amplitude and phase imbalance, and transmission loss over the frequency. Accordingly, the transmission loss from the balanced input port to the unbalanced output port is less than 1.4 dB in the 1.5-to-4-GHz band (less than 1 dB in the 1.6-to-3.8-GHz band). Fig. 20. Measurement setup. # V. EXPERIMENTAL RESULTS The measurement setup is shown in Fig. 20. The DPAs, the SRAM, and digital circuitries operate on separate 1.1-V domains. The baseband data of each DPA are generated in MATLAB and independently applied to the DTX using four parallel on-chip 1k SRAMs running at $F_{\rm BB}=600$ MHz. The measured output powers include the matching loss, and the power consumption of all blocks (except SRAMs) is included in the reported system efficiency. The matching network is tuned according to the targeted multi-mode DTX operation mode. # A. Static Measurements 1) Power/Efficiency Measurement Over the RF Bandwidth: The DTX is first characterized by static measurements. The output power is measured using a power meter. The measured Fig. 21. Measured (a) peak output power, (b) drain, and (c) system efficiencies versus frequency for different operation modes. peak output power ( $P_{Out}$ ), DE,<sup>1</sup> and system efficiency (SE)<sup>2</sup> for different operation modes versus frequency are shown in Fig. 21. In Mode-1, the proposed DTX delivers 23.18-dBm peak $P_{\text{Out}}$ at 2.1 GHz with DE and SE of 66.26% and 52.59%, respectively. The difference between the drain and system efficiency is caused by the fixed power consumption of circuit blocks that do not scale with the output power. Therefore, if the output power increases, while the other blocks are unchanged, the SE becomes closer to the DE [40], [41]. The peak $P_{\text{Out}}$ and DE vary in different operation modes. Namely, in Mode-1, the effective duty cycle of the combined vectors can be as high as 50% when using a 25%-LO clock, while, in Mode-4, the combined duty cycle reduces to 37.5%, yielding a significant efficiency improvement. Overall, the DTX achieves a 3-dB bandwidth of 1.35 GHz in a 1.65-to-3-GHz band while maintaining decent performance. 2) Power/Efficiency Measurement Over the I/Q Plane: A 400-point static measurement is performed using 20 samples for each I/Q symbol (first I/Q quadrant) at 2.4 GHz. In Fig. 22, measured DE contours for different operation modes are extracted for the four-quadrant plane. Generally, the Cartesian configuration has 4-petal-like efficiency contours along the diagonal lines of the I/Q plane. In contrast, the multi-phase operation has eight-petal-like efficiency contours where multiples of 22.5° lines are located, imitating polar contours, thus reducing the phase dependence of the efficiency. The dashed gray circle on the I/Q plane represents the average power region for a 160-MHz 256-QAM OFDM signal. The enhanced efficiency performance is evident in multi-phase operation modes as it has higher efficiency on the gray circle than its Cartesian DTX. # B. Complex Modulated Signal Measurements The DTX dynamic performance is verified by employing single-/multi-channel higher-order modulation schemes, such as OFDM signals with different modulation bandwidths utilizing a simple memory-less digital pre-distortion (DPD). Fig. 22. Measured DE contours of the 400-point static points using 20 samples for each I/Q symbol in the first I/Q quadrant at 2.4 GHz extracted for the four-quadrant plane for (a) Mode-1, (b) Mode-2, (c) Mode-3, and (d) Mode-4 operation. Fig. 23. Measured spectrum of four-channel × 40 MHz 64-QAM OFDM signal, constellation diagram of Ch-4 and ACLR, and average EVM performances versus average output power for (a) Mode-1, (b) Mode-2, (c) Mode-3, and (d) Mode-4 operation. The spectral purity of a "four-channel $\times$ 40-MHz 64-QAM OFDM" signal with an aggregated bandwidth of 160 MHz is applied to the DTX, and the performance is verified in different operation modes at $f_C = 2.4$ GHz. The measured spectrum of the signal and its Ch-4 constellation diagram for Mode-1/-2/-3/-4 scenarios are depicted in Fig. 23(a)–(d), respectively. The DTX achieves an average output power of more than 13.5/11.43/7.73/9.41 dBm while maintaining the average drain and system efficiency of 30.34%/23.5%/23.96%/27.51% and 24.9%/18.05%/19.95%/22.57%, respectively. The adjacent channel power ratio (ACLR) is better than $<sup>{}^{1}\</sup>text{DE}(\%) = 100 \times (P_{\text{RF}_{\text{Out}}}/P_{\text{dc-Power Cells}}).$ $<sup>^{2}</sup>$ SE(%) = 100 × ( $P_{RF_{Out}}/P_{dc-Power Cells} + P_{dc-All Blocks(Except SRAMS)}$ ). Fig. 24. Average ACLR and EVM of the DTX in different modes of operation with various one-to-four-channel $\times$ 40-MHz 64-QAM OFDM signals. Fig. 25. Measured (a) EVM and ACLR of a 20-MHz 64-QAM OFDM signal versus DPAs timing mismatch and (b) EVM and ACLR with 60-ps timing mismatch versus signal bandwidth for interleaving and non-interleaving modes. -42.18/-40.23/-40.71/-38.34 dBc, and the average EVM is -36.30/-34.35/-34.83/-32.46 dB, respectively. The ACLR and average EVM performances versus average output power are also exhibited in Fig. 23(a)–(d), reaching -39.7-/-37.4-/-38.3-/-36.1-dB average EVMs at 10.5-/8.43-/4.73-/6.41-dBm average output powers, while the ACLR is better than -44.6/-43.8/-43.6/-42.2 dBc, respectively. Fig. 24 presents the average EVM and ACLR performance of a contiguous multi-channel "40-MHz 64-QAM OFDM" signal dependent on the number of carriers (aggregated bandwidth) in different modes of operation. Generally, the average EVM and ACLR of a one-channel 64-QAM OFDM with an aggregated bandwidth of 40 MHz are better than -35.5 dB and -41.1 dBc. As shown in Fig. 16, fine-tuned phase aligners are adopted for each single DPA to compensate for the timing mismatch between them. Fig. 25(a) shows the EVM and ACLR of a 20-MHz 64-QAM OFDM signal versus DPAs timing mismatch for interleaving and non-interleaving modes. As illustrated, any timing mismatch will degrade the ACLR and EVM. Increasing the input signal bandwidth makes it even more challenging to achieve good linearity since it directly increases the impact of time alignment errors, as shown in Fig. 25(b). The proposed DTX is then evaluated by a 200-MHz "single-channel 256-QAM OFDM" signal in the Mode-1(-4) scenario, and the performance is measured at 2.4 GHz (see Fig. 26). In this case, the average delivered output power is more than 14.11(9.29) dBm, while the ACLR and EVM are better than -42.05 (-41.05) dBc and -34.66 (-33.14) dB, respectively. The ACLR and average EVM performances versus average output power are also exhibited in Fig. 26, reaching -37.2 (-36.4)-dB average EVM at 11.1 (6.3)-dBm average output power, while the ACLR is better than -45.3 (-44.9) dBc. Fig. 26. Measured spectrum of the single-channel $\times$ 200-MHz 256-QAM OFDM signal, its constellation diagram, and ACLR and average EVM performances versus average output power for (a) Mode-1 and (b) Mode-4 operation. Fig. 27. Measured spectrum of single-channel $\times$ 200-MHz 1024-QAM OFDM signal, its constellation diagram, and ACLR and average EVM performances versus average output power for (a) Mode-1 and (b) Mode-4 operation. TABLE I PERFORMANCE SUMMARY AND COMPARISON WITH STATE OF THE ART | Specifications | This Work | | JSSC 2020<br>A. Bassat | | JSSC 2017<br>M. Hashemi | | JSSC 2017 | 7 JSSC 2021<br>M.R. Beikmirza | | JSSC 2022 | JSSC 2020<br>S.W. Yoo | | JSSC 2022<br>B. Yang | | |-----------------------------|-------------------|-------------------|------------------------|--------|-------------------------|---------------------|---------------------|-------------------------------|---------------------|---------------|------------------------|--------------------|------------------------------------|--------------| | Specifications | | | | | | | W. Yuan | | | Y. Li | | | | | | Technology | CMOS 40nm | | CMOS 28nm | | CMOS 40nm | | CMOS<br>130nm | CMOS 40nm | | CMOS<br>55nm | CMOS 60nm | | CMOS 40nm | | | Architecture | Multi-mode DTX* | | Polar | | Polar | | Multi-phase<br>SCPA | Quadrature<br>4-way Doherty | | Quadrature | TI-Doherty<br>/Class G | | Quadrature SCPA<br>/Hybrid Doherty | | | Die Area (mm²) | 2.1 (0.72 ‡) | | 4¥ | | 0.45 | | 3.7 | 3.55 (1.5 ‡) | | 1.83 (1.19 ‡) | 3.36 | | 2.2 | | | Supply (V) | 0.95 | | 1.4 | | 0.5 | | 3 | 1 | | 1.2 / 2.4 | 2.5 | | 1.2/2.4 | | | Frequency (GHz) | 2.4 | | 2.5 | 5 | 2.2 ** | | 1.8 | 5.4 | | 0.85 | 2.4 | | 2.4 | | | 3dB Power BW | 1.35 GHz | | N/A | | 1.5 GHz * | | 750 MHz | 1.3 GHz | | 1 GHz * | N/A | | 1.1GHz ** | | | Peak P <sub>out</sub> (dBm) | (Mode-1)<br>23.18 | (Mode-4)<br>19.26 | 27 | 27 | 14.6 | | 26 | 27.4 | | 29.3 | 30 | | 30.3 | | | Peak DE (%) | 66.2 | 66.4 | N/A | N/A | 43.8 | | N/A | 47.4 | | N/A | 40 | ).2 | 41.3 | | | Peak SE (%) | 52.59 | 54.73 | 53 | 35 | 28.8 ↑ | | 24.9 | 30.66 | | 43.1 | N/A | | 36.5 | | | Modulation | 1024-QAM<br>OFDM | 1024-QAM<br>OFDM | MCS11 | MCS11 | 64-<br>QAM<br>@2GHz | 64-<br>QAM<br>@2GHz | 64-QAM<br>LTE | 256-<br>QAM<br>OFDM | 256-<br>QAM<br>OFDM | 64-QAM<br>LTE | 1024-<br>QAM | 64-<br>QAM<br>OFDM | 256-QAM | 1024-<br>QAM | | Bandwidth (MHz) | 200 | 200 | 40 | 160 | 20 | 40 | 10 | 40 | 240 | 20 | 10 | 10 | 60 | 40 | | PAPR (dB) | 10.1 | 10.1 | 6.9 | 7.8 | 8 | | 5.1 | 8.64 | 9.68 | 8.29 | 6.8 | 10.9 | 6.98 | 9.86 | | Avg. P <sub>out</sub> (dBm) | 12.23 | 7.32 | 20.1 | 19.2 | 6.1 <sup>§</sup> | | 20.9 | 18.9 | 17.8 | 21.01 | 23.2 | 19.1 | 23.3 | 20.4 | | Avg. DE (%) | 23.82 | 22.83 | N/A | N/A | 17.5 <sup>\$</sup> | | N/A | 43.1 | 41.2 | N/A | 36.2% | 30.3% | 30.7% | 22.6% | | Avg. SE (PAE) (%) | 19.34 | 18.81 | (28.9) | (21.2) | 14.3 <sup>\$†</sup> | | 15.2 | 24.5 | 22.1 | 20.1 | N/A | N/A | N/A | N/A | | EVM (dB) | -33.99 | -33.54 | -35 | -35 | NA | -31 | -29.1 | <b>-</b> 40 | -32.2 | -25.1 | -44.5 | -41.7 | <b>-</b> 31.9 | -35.9 | | ACLR (dBc) | -43.0/-43.6 | -43.4/-43.9 | N/A | N/A | -43/-49 | -40/-45 | 30.3/-31.7 | -47/-47 | -39/-39 | N/A | N/A | N/A | -32/-30 | 35/-34 | | Linearization | DPD | | DPD | | NO | | DPD | DPD | | NO | NO | | DPD | | \* Off-chip matching network. \*\* Estimated from reported figures and plots. \*Core area. \*Area including Digital front end, DPLL, and LB/HB DTX. † Excluding LO generation. \*For a 10MHz QAM signal with PAPR = 8-9dB. Finally, to validate the capability of our proposed techniques, our proposed multi-mode DTX is examined with a 200-MHz single-channel 1024-QAM OFDM signal at 2.4 GHz. In Mode-1 [see Fig. 27(a)], the average delivered output power is more than 12.23 dBm, while the ACLR and EVM are better than -43.01 dBc and -33.99 dB, respectively. In the Mode-4 scenario [see Fig. 27(b)], the average delivered output power is more than 7.32 dBm, while the ACLR and EVM are better than -43.41 dBc and -33.54 dB, respectively. The ACLR and average EVM performances versus average output power are also exhibited in Fig. 27(a) and (b) for Mode-1 and Mode-2, respectively. It indicates the DTX reaches -37/-36.1-dB average EVM at 9.23-/4.32-dBm average output powers, while the ACLR is better than -45.5/-46.5 dBc, respectively. The DTX performance is summarized and compared to that of the prior art in Table I. Compared to the DTXs without efficiency enhancement techniques, the realized multi-mode DTX exhibits the highest data rate, decent average efficiency, and high peak $P_{\rm Out}$ , suitable for the prevailing and next-generation wireless communication systems. # VI. CONCLUSION A wideband energy-efficient versatile TX has been demonstrated. It leverages the advantages of digitally configured architecture to support a multi-mode/multi-band operation. It features various modes comprising Cartesian and multi-phase configurations utilizing LO clocks with different duty cycle in the interleaving and non-interleaving configurations. The multi-phase architecture inherits the advantages of the Cartesian DTX, such as wideband operation, and symmetrical and synchronized I/Q paths with a DE behavior imitating the polar counterparts. Realized in 40-nm bulk CMOS with an off-chip matching network, the DTX generates more than 23.18-dBm peak P<sub>Out</sub> from a 0.95-V supply, with 66.26%/52.59% DE/SE in a 1-4-GHz band. For a 200-MHz single-channel 1024-QAM OFDM signal at 2.4 GHz in Modes-1/-4, the average delivered output power is 12.23/7.32 dBm, while the ACLR and EVM are better than -43/-43 dBc and -33.5/-33.9 dB, respectively. Compared to the DTXs without efficiency enhancement techniques, the proposed DTX achieves state-of-the-art performance exhibiting the highest data rate, decent average efficiency, and high peak power capability, making it an interesting candidate for wireless communication systems. # ACKNOWLEDGMENT imec-Leuven is acknowledged for handling the tape-out. The authors thank Atef Akhnoukh and Zu-Yao Chang for their strong support during the design, fabrication, and measurement. They also thank Quinten Bruinsma for data representation help, and Rob Bootsman and Dieuwert Mul for helpful analytical discussions. # REFERENCES - [1] A. Yadav and O. A. Dobre, "All technologies work together for good: A glance at future mobile networks," *IEEE Wireless Commun.*, vol. 25, no. 4, pp. 10–16, Aug. 2018. - [2] B. Khamaisi et al., "A 16 nm, +28 dBm dual-band all-digital polar transmitter based on 4-core digital PA for Wi-Fi6E applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 65, Feb. 2022, pp. 324–326. - [3] D. Chowdhury, L. Ye, E. Alon, and A. Niknejad, "An efficient mixed-signal 2.4-GHz polar power amplifier in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1796–1809, Aug. 2011. - [4] J. Lee, D. Jung, D. Munzer, and H. Wang, "A compact wideband joint bidirectional class-G digital Doherty switched-capacitor transmitter and N-path quadrature receiver through capacitor bank sharing," in *Proc.* IEEE Custom Integr. Circuits Conf. (CICC), Apr. 2022, pp. 1–2. - [5] S.-W. Yoo, S.-C. Hung, and S.-M. Yoo, "A multimode multi-efficiency-peak digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 55, no. 12, pp. 3322–3334, Dec. 2020. - [6] A. Ben-Bassat et al., "A fully integrated 27-dBm dual-band all-digital polar transmitter supporting 160 MHz for Wi-Fi 6 applications," *IEEE J. Solid-State Circuits*, vol. 55, no. 12, pp. 3414–3425, Dec. 2020. - [7] M. Hashemi, Y. Shen, M. Mehrpoo, M. S. Alavi, and L. C. N. de Vreede, "An intrinsically linear wideband polar digital power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3312–3328, Dec. 2017. - [8] Y. Shen et al., "A fully-integrated digital-intensive polar Doherty transmitter," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2017, pp. 196–199. - [9] S.-M. Yoo, J. S. Walling, E. C. Woo, B. Jann, and D. J. Allstot, "A switched-capacitor RF power amplifier," *IEEE J. Solid State Circuits*, vol. 46, no. 12, pp. 2977–2987, Dec. 2011. - [10] M. Beikmirza et al., "6.2 A 4-way Doherty digital transmitter featuring 50%-LO signed IQ interleave upconversion with more than 27 dBm peak power and 40% drain efficiency at 10 dB power back-off operating in the 5 GHz band," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 92–94. - [11] Y. Shen, R. Bootsman, M. S. Alavi, and L. C. N. de Vreede, "A wideband IQ-mapping direct-digital RF modulator for 5G transmitters," *IEEE J. Solid-State Circuits*, vol. 57, no. 5, pp. 1446–1456, May 2022. - [12] H. Jin, D. Kim, and B. Kim, "Efficient digital quadrature transmitter based on IQ cell sharing," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1345–1357, May 2017. - [13] S.-W. Yoo, S.-C. Hung, and S.-M. Yoo, "A watt-level quadrature class-G switched-capacitor power amplifier with linearization techniques," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1274–1287, May 2019. - [14] B. Yang, H. Jenny Qian, and X. Luo, "Quadrature switched/floated capacitor power amplifier with reconfigurable self-coupling canceling transformer for deep back-off efficiency enhancement," *IEEE J. Solid-State Circuits*, vol. 56, no. 12, pp. 3715–3727, Dec. 2021. - [15] Y. Li et al., "A 15-bit quadrature digital power amplifier with transformer-based complex-domain efficiency enhancement," *IEEE J. Solid-State Circuits*, vol. 57, no. 6, pp. 1610–1622, Jun. 2022. - [16] M. Alavi, J. Mehta, and R. Staszewski, Radio-Frequency Digital-to-Analog Converters: Implementation in Nanoscale CMOS. Amsterdam, The Netherlands: Elsevier, 2016. - [17] R. B. Staszewski and M. S. Alavi, "Digital I/Q RF transmitter using time-division duplexing," in *Proc. IEEE Int. Symp. Radio-Frequency Integr. Technol.*, Nov. 2011, pp. 165–168. - [18] M. Hashemi et al., "17.5 An intrinsically linear wideband digital polar PA featuring AM-AM and AM-PM corrections through nonlinear sizing, overdrive-voltage control, and multiphase RF clocking," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 300–301. - [19] M. S. Alavi, R. B. Staszewski, L. C. N. de Vreede, and J. R. Long, "Orthogonal summing and power combining network in a 65-nm alldigital RF I/Q modulator," in *Proc. IEEE Int. Symp. Radio-Freq. Integr. Technol.*, Nov. 2011, pp. 21–24. - [20] M. Mehrpoo et al., "A wideband linear I/Q-interleaving DDRM," IEEE J. Solid-State Circuits, vol. 53, no. 5, pp. 1361–1373, May 2018. - [21] H. Wang et al., "High-efficiency all-digital transmitter," U.S. Patent 8 385 469, Feb. 26, 2013. - [22] E. W. McCune, Jr., "Wideband phase modulation methods and apparatus," U.S. Patent 8508309, Aug. 13, 2013. - [23] E. W. McCune, "Concurrent polar and quadrature modulator," in *Proc. WAMICON*, Jun. 2014, pp. 1–4. - [24] W. Yuan and J. S. Walling, "A multiphase switched capacitor power amplifier in 130 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2016, pp. 210–213. - [25] W. Yuan and J. S. Walling, "A multiphase switched capacitor power amplifier," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1320–1330, May 2017. - [26] M. Beikmirza et al., "A 1-to-4 GHz multi-mode digital transmitter in 40 nm CMOS supporting 200 MHz 1024-QAM OFDM signals with more than 23 dBm/66% peak power/drain efficiency," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2022, pp. 01–02. - [27] Y. Shen, R. Bootsman, M. S. Alavi, and L. de Vreede, "A 0.5-3 GHz I/Q interleaved direct-digital RF modulator with up to 320 MHz modulation bandwidth in 40 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Mar. 2020, pp. 1–4. - [28] M. Mehrpoo, M. Hashemi, Y. Shen, R. van Leuken, M. S. Alavi, and L. C. N. de Vreede, "A wideband linear direct digital RF modulator using harmonic rejection and I/Q-interleaving RF DACs," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2017, pp. 188–191. - [29] Y. Shen et al., "A 1–3 GHz I/Q interleaved direct-digital RF modulator as a driver for a common-gate PA in 40 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 287–290. - [30] W. M. Gaber, P. Wambacq, J. Craninckx, and M. Ingels, "A CMOS IQ direct digital RF modulator with embedded RF FIR-based quantization noise filter," in *Proc. ESSCIRC*, Sep. 2011, pp. 139–142. - [31] M. S. Alavi, A. Visweswaran, R. B. Staszewski, L. C. N. de Vreede, J. R. Long, and A. Akhnoukh, "A 2-GHz digital I/Q modulator in 65-nm CMOS," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2011, pp. 277–280. - [32] M. S. Alavi et al., "A wideband 2×13-bit all-digital I/Q RF-DAC," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 4, pp. 732–752, Apr. 2014. - [33] M. Beikmirza, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, "A wideband four-way Doherty bits-in RF-out CMOS transmitter," *IEEE J. Solid-State Circuits*, vol. 56, no. 12, pp. 3768–3783, Dec. 2021. - [34] A. Grebennikov, RF and Microwave Power Amplifier Design. New York, NY, USA: McGraw-Hill, 2015. - [35] N. Kumar, C. Prakash, A. Grebennikov, and A. Mediano, "High-efficiency broadband parallel-circuit class E RF power amplifier with reactance-compensation technique," *IEEE Trans. Microw. Theory Techn.*, vol. 56, no. 3, pp. 604–612, Mar. 2008. - [36] M. Acar, A. J. Annema, and B. Nauta, "Analytical design equations for class-E power amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 54, no. 12, pp. 2706–2717, Dec. 2007. - [37] M. Hashemi, L. Zhou, Y. Shen, and L. C. N. de Vreede, "A highly linear wideband polar class-E CMOS digital Doherty power amplifier," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 10, pp. 4232–4245, Oct. 2019. - [38] M. Hashemi et al., "Highly efficient and linear class-E CMOS digital power amplifier using a compensated Marchand balun and circuit-level linearization achieving 67% peak DE and -40dBc ACLR without DPD," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2017, pp. 2025–2028. - [39] M. Beikmirza, Y. Shen, L. C. N. de Vreede, and M. S. Alavi, "A wideband two-way digital Doherty transmitter in 40 nm CMOS," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2022, pp. 975–978. - [40] R. J. Bootsman et al., "High-power digital transmitters for wireless infrastructure applications (a feasibility study)," *IEEE Trans. Microw. Theory Techn.*, vol. 70, no. 5, pp. 2835–2850, May 2022. - [41] R. Bootsman et al., "A 39 W fully digital wideband inverted Doherty transmitter," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2022, pp. 979–982. Mohammadreza Beikmirza (Graduate Student Member, IEEE) received the B.Sc. and M.Sc. degrees in electrical engineering from the Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, in 2014 and 2016, respectively. He is currently pursuing the Ph.D. degree with the Delft University of Technology, Delft, The Netherlands. His current research interests include digitalintensive transmitters and RF/monolithic microwave integrated circuits design. Mr. Beikmirza was a recipient of the Amirkabir University of Technology Best Undergraduate Thesis Award 2015, IEEE Iran Section Best Undergraduate Thesis Award 2015, and the Platinum Award (first place) of the Huawei RFIC Contest 2021, and a co-recipient of the ISE President Best Paper Award 2021. Yiyu Shen (Member, IEEE) received the M.S. degree in microelectronics from Tsinghua University, Beijing, China, and Katholieke Universiteit Leuven, Leuven, Belgium, in 2014, and the Ph.D. degree in electrical engineering from the Delft University Technology, Delft, The Netherlands, in 2021. His current research interests include power amplifiers and digital-assisted RF integrated circuits and systems. **Leo C. N. de Vreede** (Senior Member, IEEE) received the Ph.D. degree (*cum laude*) from the Delft University of Technology, Delft, The Netherlands, in 1996. In 1996, he was appointed as an Assistant Professor at the Delft University of Technology, working on the nonlinear distortion behavior of active devices. In 1999 and 2015, he was appointed, respectively, as an Associate Professor and a Full Professor at the Delft University of Technology, where he became responsible for the Electronic Research Laboratory (ERL/ELCA). During that period, he worked on solutions for improved linearity and RF performance at the device, circuit, and system levels. He is a Co-Founder/an Advisor of Anteverta-mw, Eindhoven, The Netherlands, a company that is specialized in RF device characterization. He has (co)authored more than 130 IEEE refereed conference papers and journal articles. He holds several patents. His current interests include RF measurement systems, RF technology optimization, and (digital-intensive) energy-efficient/wideband circuit/system concepts for wireless applications. Dr. de Vreede was a co-recipient of the IEEE Microwave Prize in 2008 and a Mentor of the Else Kooi Prize Awarded Ph.D. Work in 2010 and the Dow Energy Dissertation Prize Awarded Ph.D. Work in 2011. He was a recipient of the TUD Entrepreneurial Scientist Award in 2015. He (co)guided several students who won (best) paper awards at the Bipolar/BiCMOS Circuits and Technology Meeting (BCTM), the Program for Research on Integrated Systems and Circuits (PRORISC), the European Solid-state Circuits and Devices Conference (ESSDERC), the International Microwave Symposium (IMS), the Radio Frequency Integration Technology (RFIT), and the Radio Frequency Integrated Circuits Symposium (RFIC). Morteza S. Alavi (Member, IEEE) received the B.S.E.E degree from the Iran University of Science and Technology, Tehran, Iran, in 2003, the M.S.E.E degree from the University of Tehran, in 2006, and the Ph.D. degree in electrical engineering from the Delft University of Technology (TU-Delft), Delft, The Netherlands, in 2014. He was a Co-Founder and the CEO of DitlQ B.V., Delft, a local company developing energy-efficient, wideband wireless transmitters for the next generation of the cellular network. Since September 2016, he has been with the Electronic Circuits and Architectures (ELCA) Research Group, TU-Delft, where he is currently a tenured Assistant Professor. He has coauthored *Radio Frequency Digital-to-Analog Converter* (Elsevier, 2016). His main research interests include designing high-frequency and high-speed wireless/cellular communication and sensor systems, as well as in the field of wireline transceivers. Dr. Alavi was a recipient of the Best Paper Award at the 2011 IEEE International Symposium on Radio Frequency Integrated Technology (RFIT). He received the Best Student Paper Award (Second Place) at the 2013 Radio Frequency Integrated Circuits (RFIC) Symposium. His Ph.D. student also won the Best Student Paper Award (First Place) at the 2017 RFIC Symposium in Honolulu, HI, USA. His research group recently received the 2021 Institute of Semiconductor Engineers (ISE) President Best Paper Award of the International SoC Design Conference (ISOCC). One of his Ph.D. students also received the First-Place Award of the 2021 Huawei Student Design Content.