# A Versatile and Efficient 0.1-to-11 Gb/s CML Transmitter in 40-nm CMOS Feng, Jun; Beikmirza, Mohammadreza; Mehrpoo, Mohammadreza; de Vreede, Leo C.N.; Alavi, Morteza S. 10.1109/ISOCC53507.2021.9613887 **Publication date** **Document Version** Accepted author manuscript Published in 2021 18th International SoC Design Conference (ISOCC) Citation (APA) Feng, J., Beikmirza, M., Mehrpoo, M., de Vreede, L. C. N., & Alavi, M. S. (2021). A Versatile and Efficient 0.1-to-11 Gb/s CML Transmitter in 40-nm CMOS. In *2021 18th International SoC Design Conference (ISOCC): Proceedings* (pp. 41-42). Article 9613887 (Proceedings - International SoC Design Conference 2021, ISOCC 2021). IEEE. https://doi.org/10.1109/ISOCC53507.2021.9613887 Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. **Takedown policy**Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. # A Versatile and Efficient 0.1-to-11 Gb/s CML Transmitter in 40-nm CMOS Jun Feng<sup>1</sup>, Mohammadreza Beikmirza, Mohammadreza Mehrpoo<sup>2</sup>, Leo C.N. de Vreede, and Morteza S. Alavi Delft University of Technology (ELCA Research Group), Delft, NL <sup>1</sup>now with KU Leuven (MICAS Research Group), Leuven, BE; <sup>2</sup>now with Broadcom-Netherlands, Bunnik, NL Email: jun.feng@esat.kuleuven.be Abstract— We present a wireline transmitter (TX) for reconfigurable chip-to-chip links. The proposed design features a frequency-adaptive clock chain, a fast 16:1 clocked-CMOS multiplexer (C $^2$ MOS MUX) tree, and a full-rate synchronous current-mode logic (CML) clock driver. A prototype realized in 40-nm CMOS accomplishes a wide 0.1-to-11 Gb/s operation range (f<sub>max</sub>/f<sub>min</sub> = 110×). At 11 Gb/s, the prototype achieves 3.98 pJ/bit for a bit error rate (BER) $<10^{-12}$ with a 60.9-ps eye width. #### I. Introduction An impressive surge of advanced radio frequency (RF) transmitters (TXs) has enabled reconfigurable (multi-channel) communication solutions with effective data rates from 100 Mb/s up to 10 Gb/s, catering applications with various bandwidth such as LTE-cat M, 802.11 ah/ax, and sub-6 GHz 5G. However, the chip-to-chip *wireline* links that provide the baseband data must be able to operate with equal flexibility. Today's wireline standards, such as JESD204B for high-end data converters, do not support $f_{\rm max}/f_{\rm min} \geq 100\times$ yet. To tackle this problem, we present a 40-nm CMOS wireline TX that supports a variable data rate of 0.1 to 11 Gb/s, targeting a bit error rate (BER) $\leq 10^{-12}$ and energy efficiency < 5 pJ/bit. ## II. WIRELINE TRANSMITTER IMPLEMENTATION Fig. 1 shows the proposed wireline TX architecture. A wideband on-chip transformer accepts a single-ended full-rate clock for a 0.1-to-11 GHz frequency-adaptive clock chain to generate five complementary clocks. A pseudo-random bit stream (PRBS) feeds the 16:1, full-rate binary tree multiplexer (MUX), for up to $2\times$ 8-b complex data. A clocked-CMOS (C²MOS) design enables a fast but low-power MUX. Two four-stage, $50\text{-}\Omega$ current-mode logic (CML) output driver (OD) chains synchronously transmit the low-swing, high-speed data, and its full-rate forwarded clock, respectively, to facilitate clock recovery-less links. The nominal $300\text{-mV}_{pp}$ swings are adjustable through a current reference. ### A. Frequency-Adaptive Clock Chain The generation of well-aligned, divided clocks across wide frequencies for a tree MUX is challenging due to substantial variation of relative propagation delays across divider chains $(t_{\rm BUF})$ and retimers $(t_{\rm CQ})$ . Fig. 2(a) illustrates the desired scenario that should be satisfied up to 11 GHz. To address this, we propose a two-step approach for this clock chain design: coarse-fine self-retiming, combined with a simple binary delay. Fig. 1. Proposed wireline TX architecture. The high-speed outputs are synchronized and ESD-protected (300 fF per pad). Fig. 2(b) depicts the frequency-adaptive clock chain architecture. Dividers must employ buffers that incur unwanted delays, resulting in variable accumulated skew. The coarse self-retimer minimizes these delays by taking each internally mini-buffered "parent"-clock ck/i, to retime its divided "child"-clock ck/j (j=2i). Consequently, short delay lines easily compensate for the remaining fixed parent-to-child skews, such that "fine" flipflops (FFs) safely perform the final retime. By merit of this approach, frequency-dependent timing constraints are exclusively isolated at the clock of the fine FFs ( $CK_{\rm IN}$ ). Effectively, a single circuit can now close clock chain timing across variable data rates. To this end, a controlled XORgate in the $CK_{\rm IN}$ path is inserted to enable a 0/180-degree phase shift, i.e., a frequency-dependent delay. #### B. Inherently-Pipelined $C^2MOS\ MUX$ Without having to resort to a power-hungry CML MUX, we adopt a custom-digital $C^2MOS$ MUX structure, originally introduced as a "clocked inverter multiplexer" by [1]. The $C^2MOS$ MUX, shown in Fig. 3, inherently exhibits the FF-function (pipelining) that decouples MUX slice depth from its speed. In our proposed design, the clocked (cascode) gates are kept at a minimum size to prevent significant clock feedthrough. The data-driven gates are sized to minimize the critical clock-to-Q delays, such that only intrinsic rise/fall times limit the MUX clock frequency ( $f_{MUX}$ ), enabling high speeds. Proposed frequency adaptive clock chain. (a) Tree MUX clock alignment. (b) Implementation details (all clocks are complementary). Fig. 3. Implemented C<sup>2</sup>MOS MUX tree based on [1] and the proposed CMOSto-CML converter. A proposed "CMOS-to-CML" circuit succeeds the MUX and converts the single-phase MUX data to its complementary, high-common mode counterpart, re-using the C<sup>2</sup>MOS latch. This clocked arrangement achieves near-perfect skewcancellation (critical for CML circuitry) and enables coherency for the two driver chains. A 120-ps full-scale, 2-bit delay line ensures sufficient converter timing margin across frequency. The 16:1, C<sup>2</sup>MOS MUX achieves > 12.5 Gb/s for $< 300 \mu W$ at 1.1 V in post-layout simulations. #### III. MEASUREMENT RESULTS The prototype wireline TX is fabricated in 40-nm CMOS with 0.1-mm<sup>2</sup> core area. The TX achieves 0.1-to-11 Gb/s operation, speed-limited by the input transformer bandwidth (lower than the $f_{\text{MUX}}$ -limit expected at $\approx 12.5$ Gb/s). Fig. 4 shows the measured eye diagram at 11 Gb/s with a 325mV<sub>pp</sub> swing, resulting in a 60.9-ps wide eye opening for a Fig. 4. Wireline TX prototype. (a) Die photo. (b) Measured eye diagram and bathtub curve (green overlay) at 11 Gb/s. TABLE I PERFORMANCE COMPARISON | | This Work | [2]△ | [3]△ | [4] | |------------------------------------------|--------------|------------|-------|-------| | Process | 40 nm | 28 nm | 40 nm | 65 nm | | Driver Circuit | CML | LVDS / SST | SST | SST | | Data Rate (Gb/s) | ≤11 | 12.5 | ≤12.5 | ≤8.5 | | Swing* (mV <sub>pp</sub> ) | 325 | 135 / 703 | 648 | 1000 | | Total Jitter*† (ps) | 24.9 | 27 / 28 | - | 17.5 | | Efficiency* (pJ/bit) | 3.98 (2.66#) | 1.1 / 1.7 | 2.88 | 11.3 | | f <sub>max</sub> /f <sub>min</sub> Range | 110× | - | 12.5× | 1.7× | \*data rate = f<sub>max</sub>; †log(BER) = -12; #w/o clock driver; <sup>∆</sup>targets JESD204B BER $\leq 10^{-12}$ . The ESD-protected channel (> 300 fF total per pad) consists of 1.5-mm bondwires and 2-cm FR4 traces. At 11 Gb/s, the efficiency is 3.98 pJ/bit, and improves to an optimal 2.68 pJ/bit at 8 Gb/s. Analog power dominates at < 8 Gb/s, which can be compensated for by reducing the swing. Table I compares this performance to the prior art, including some voltage-mode driver (SST) TXs. Our prototype CML TX demonstrates at least 8× better $f_{\text{max}}/f_{\text{min}}$ , for no more than 4× additional relative power including the coherent clock driver. # IV. CONCLUSION This paper presented a wireline TX in 40-nm CMOS to accommodate for the future demand of highly variable data rates as high as 10 Gb/s. The proposed frequency-adaptive clock chain and 16:1 high-speed, inherently pipelined C<sup>2</sup>MOS MUX achieve a 0.1-to-11 Gb/s operational data rate, better than the $42 \times f_{\text{max}}/f_{\text{min}}$ -range of the full JESD204B standard. Despite the built-in versatility, the prototype TX achieves an efficiency of 3.98 pJ/bit for a BER $< 10^{-12}$ (60.9-ps eye width) at 11 Gb/s, including two coherent CML drivers for data and a forwarded clock. ### ACKNOWLEDGMENT The authors thank Catena Microelectronics for its support. #### REFERENCES - [1] K. Fukuda, et al., "A 12.3-mW 12.5-Gb/s Complete Transceiver in 65-nm - CMOS Process," in *IEEE JSSC*, 2010. F. Celik, *et al.*, "JESD204B Compliant 12.5 Gb/s LVDS and SST Transmitters in 28 nm FD-SOI CMOS," in IEEE PRIME, 2019. - B. Chattopadhyay, et al., "A 12.5Gbps Transmitter for Multi-standard SERDES in 40nm Low Leakage CMOS Process," in IEEE VLSID, 2018. - M. Kossel, et al., "A T-Coil-Enhanced 8.5 Gb/s High-Swing SST Transmitter in 65 nm Bulk CMOS With -16 dB Return Loss Over 10 GHz Bandwidth," IEEE JSSC, 2008.