Edge-prediction DTC and Clock-gating TDC Design for Ultra Low Power All Digital PLL

Bindi Wang
Edge-prediction DTC and Clock-gating TDC Design for Ultra Low Power All Digital PLL

MASTER OF SCIENCE THESIS

For the degree of Master of Science in Microelectronics at Delft University of Technology

Bindi Wang

December 20, 2013

Faculty of Electrical Engineering Mathematics and Computer Science (EWI) · Delft University of Technology
The work in this thesis was supported by imec-nl. Their cooperation is hereby gratefully acknowledged.

Copyright © Electrical Engineering Mathematics and Computer Science (EWI)  
All rights reserved.
The undersigned hereby certify that they have read and recommend to the Faculty of Electrical Engineering Mathematics and Computer Science (EWI) for acceptance a thesis entitled

**Edge-prediction DTC and Clock-gating TDC Design for Ultra Low Power All Digital PLL**

by

**Bindi Wang**

in partial fulfillment of the requirements for the degree of

**Master of Science Microelectronics**

Dated: December 20, 2013

Supervisor(s):  

Prof. dr. R.B. Staszewski. Supervisor

Reader(s):

dr. Spirito, M.

dr. ir. P.J.A. Harpe

dr. Maja Vidojkovic

dr. Xun Luo
Abstract

Wireless Personal Area Network (WPAN) radios provide the communication standard for short-range low cost applications in medical and rehabilitation fields. In such wireless sensor nodes, battery life-time is critical. Hence, there comes the essential demand for developing ultra-low power RF transceivers. RF PLL is the heavy-power block inside a transceiver. The design aims to develop an ultra-low power PLL (sub- milliWatt) to push the power limit of RF transceivers. All-digital PLLs (ADPLLs) are preferred in nano-scale CMOS over analog PLLs as the system is flexible, programmable, and technology scalable. The challenge of this work is to significantly reduce the power dissipation of TDC, which is one of the key blocks in ADPLL. The algorithm of phase prediction is implemented by the circuit of digital-to-time converter (DTC). DTC is well designed from aspect of transistor sizing to physical layout implementation. It proves to be a good solution of compact and low power cost (Post-layout simulated DC current is $10\mu A$ at $0.9V$) circuit with the fine resolution (Silicon chip measured results: $22ps$ at $1V$). Additional, just-in time DTC gain calibration is implemented by a full arithmetic approach, without any analog components. For time-to-digital-converter (TDC) core block, clock gating technique is investigated to dramatically reduce the power dissipation, and get rid of meta-stability issue. The circuits of DTC and TDC integrated with digital controlled oscillator (DCO) and the digital loop filter are fabricated in 40nm CMOS technology. The silicon chip measurement proves a 2.1 – 2.7 GHz 860$\mu W$ fractional-N ADPLL for WPAN applications, which is the first-ever wireless ADPLL to break the 1mW power barrier. In-band phase noise of ADPLL is $-90dBc/Hz$ at $30kHz$ and $-109dBc/Hz$ at $1MHz$, and the fractional spurs over Bluetooth channels is below $-35dBc$. From the measurement results, the DC power consumed by DTC and TDC is only $40mW$, least than 5% of the total power dissipation, which is a breakthrough ever.
# Table of Contents

Abstract i  
Acknowledgments xi  

1 Introduction 1  
1-1 Background ................................................. 1  
1-1-1 Wireless Personal Area Network and IEEE 802.15.4 Technology 1  
1-1-2 The state of Art Nanoscale CMOS 40nm Technology .......... 2  
1-2 Motivation and Objectives .................................... 3  
1-3 Main Contributions of This Work ............................. 6  
1-4 Future Work .................................................. 7  
1-5 Thesis Organization .......................................... 8  

2 System analysis of ADPLL 9  
2-1 Frequency Synthesis Techniques –PLL ........................ 9  
2-1-1 PLL Fundamentals ......................................... 10  
2-1-2 State of Art All-digital PLL (ADPLL) ..................... 12  
2-1-3 Challenge for ultra-low power ADPLL .................... 17  
2-2 Phase prediction DTC assisted snapshot TDC based ADPLL .. 17  
2-2-1 Algorithm of phase prediction ............................. 17  
2-2-2 Principle of phase-prediction ADPLL ..................... 18  

3 Digital-to-Time Converter (DTC) 21  
3-1 State of Art DTC ............................................ 21  
3-2 DTC Specification ........................................... 24  
3-3 DTC Architecture .......................................... 25  
3-3-1 Principle of DTC ......................................... 25  

Master of Science Thesis                                Bindi Wang
3-3-2 Building Blocks of DTC .............................................. 26
3-4 Just-In Time DTC Gain Calibration ................................... 33
  3-4-1 Least Squared Mean Algorithm ................................. 33
  3-4-2 Principle of DTC Gain Estimation ............................. 35
  3-4-3 Implementation Of DTC Gain Estimation .................. 37

4 Time-to-Digital Converter (TDC) ........................................ 43
  4-1 TDC Circuit Design .................................................. 43
      4-1-1 Specification of TDC ....................................... 44
  4-2 Structure of TDC .................................................... 46
      4-2-1 Building blocks in TDC .................................... 47
  4-3 Principle of Clock Gating .......................................... 52
      4-3-1 Implementation of the clock gating technique ........ 54
      4-3-2 Meta-stability Issue in Clock gating .................... 55
  4-4 Reference Clock retiming .......................................... 56

5 Implementation and Experimental Verification ...................... 59
  5-1 Considerations of Layout in Nano CMOS Technology for RF Circuit ........................................ 59
  5-2 Layout of Key Blocks in DTC and TDC ......................... 59
      5-2-1 Delay Element Layout ..................................... 59
      5-2-2 Sense-Amplifier Based D-flip-flop ....................... 64
  5-3 Post-simulation Results ............................................ 64
      5-3-1 DTC .......................................................... 64
      5-3-2 TDC .......................................................... 71
      5-3-3 Clock Buffer ................................................ 72
  5-4 Measurement of DTC and TDC ...................................... 72
      5-4-1 PCB Design and Test Bench for ADPLL .................. 74
      5-4-2 Test Plan of Phase Predicted DTC ....................... 75
      5-4-3 Test Plan of the Core TDC ................................. 80
      5-4-4 DC Power .................................................... 84

Bibliography ................................................................. 87
List of Figures

1-1 The conventional architecture of fractional-N charge pump PLL ........................................... 4
1-2 Principle of digital PLL ........................................................................................................... 4
1-3 State of Art digital PLLs ....................................................................................................... 5

2-1 Basic PLL architecture, [1] .................................................................................................. 9
2-2 Output spectrum of practical oscillators ............................................................................. 10
2-3 Illustration of reciprocal mixing due to the tail of LO spectrum ........................................ 11
2-4 Phase noise spectrum of practical oscillators [2] ............................................................... 12
2-5 Linear model in s-domain of ADPLL, [3] ............................................................................. 13
2-6 Diagram of digital loop filter ............................................................................................... 14
2-7 Loop transmission of Type-II PLL ...................................................................................... 15
2-8 Linear s-domain model with noise sources .......................................................................... 15
2-9 Periodicity of phase error in fractional-N PLLs ($FCWF=0.7$) ......................................... 16
2-10 Purpose of phase prediction ............................................................................................. 18
2-11 Architecture of phase prediction PLL .............................................................................. 19

3-1 Controllable biasing current for DTC .................................................................................. 22
3-2 The delay unit digitally controlled by loading MOSCAP ................................................... 22
3-3 MUX based structure of DTC ............................................................................................. 24
3-4 Tri-stage inverter based structure of DTC .......................................................................... 24
3-5 Prototype of a tri-state buffer-self-loaded DTC ................................................................. 25
3-6 Time diagram of a DTC ....................................................................................................... 26
3-7 Building blocks of a delay cell in DTC ............................................................................... 26
3-8 Circuit of clock buffer ......................................................................................................... 27
3-9 Tri-state PMOS and NMOS pair .......................................................................................... 27
3-10 Two modes of DTC unit cell ............................................................................................. 28
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>3-11</td>
<td>Circuit implementation of DTC selection set</td>
</tr>
<tr>
<td>3-12</td>
<td>Schematic of output-enable(OE) buffer</td>
</tr>
<tr>
<td>3-13</td>
<td>Custom sizing of OE buffer: weak transistors in blue color and strong ones in red color</td>
</tr>
<tr>
<td>3-14</td>
<td>Example: the race condition for the reset action of DTC: each reset transient of the selection set is forced to be delayed because of conflicts from the previous stages</td>
</tr>
<tr>
<td>3-15</td>
<td>Potential conflicts because of the thermometer coding of the selection set at the rising edge of ( \text{clk} )</td>
</tr>
<tr>
<td>3-16</td>
<td>Conflicts at the falling edge of the reference signal ( \text{clk} )</td>
</tr>
<tr>
<td>3-17</td>
<td>Short-circuit current: path from ( \text{VDD} ) to ( \text{VSS} ) during the falling edge of ( \text{clk} ) if some stage is enabled at the rising edge of ( \text{clk} ) respectively</td>
</tr>
<tr>
<td>3-18</td>
<td>Diagram of improved architecture of DTC</td>
</tr>
<tr>
<td>3-19</td>
<td>Diagram of phase detection for phase prediction ADPLL</td>
</tr>
<tr>
<td>3-20</td>
<td>Structure of the LMS algorithm for DTC gain error detection</td>
</tr>
<tr>
<td>3-21</td>
<td>The phase departure from the ideal CKV phase increment in the conditions of: (1) overestimation of DTC gain</td>
</tr>
<tr>
<td>3-22</td>
<td>The phase departure from the ideal CKV phase increment in the conditions of: (2) underestimation of DTC gain</td>
</tr>
<tr>
<td>3-23</td>
<td>Mathematics description of the correlation between the practical prediction and the ideal prediction: Overestimation</td>
</tr>
<tr>
<td>3-24</td>
<td>Mathematics description of the correlation between the practical prediction and the ideal prediction</td>
</tr>
<tr>
<td>3-25</td>
<td>System Diagram of DTC Gain Estimation</td>
</tr>
<tr>
<td>3-26</td>
<td>Converge related to the step size</td>
</tr>
<tr>
<td>3-27</td>
<td>Configuration I</td>
</tr>
<tr>
<td>3-28</td>
<td>Configuration II</td>
</tr>
<tr>
<td>3-29</td>
<td>Configuration III</td>
</tr>
<tr>
<td>3-30</td>
<td>Configuration IV</td>
</tr>
<tr>
<td>3-31</td>
<td>Configuration III</td>
</tr>
<tr>
<td>3-32</td>
<td>([a, b, \mu] = [4, 1, 10])</td>
</tr>
<tr>
<td>3-33</td>
<td>([a, b, \mu] = [4, 2, 8])</td>
</tr>
<tr>
<td>4-1</td>
<td>Detection mechanism of TDC ([4])</td>
</tr>
<tr>
<td>4-2</td>
<td>Timing principle of fractional phase error detection</td>
</tr>
<tr>
<td>4-3</td>
<td>Architecture of a clock gated TDC</td>
</tr>
<tr>
<td>4-4</td>
<td>The prototype of delay lined based TDC (a) one stage circuit schematic, (b) timing diagram corresponding</td>
</tr>
<tr>
<td>4-5</td>
<td>NOR logic gate based Latch</td>
</tr>
<tr>
<td>4-6</td>
<td>Sense-amplifier-based D-flip-flop</td>
</tr>
<tr>
<td>4-7</td>
<td>Timing diagram of 5-stages conversion</td>
</tr>
<tr>
<td>4-8</td>
<td>(a)Edge align circuit(b) crossed inverters</td>
</tr>
<tr>
<td>4-9</td>
<td>Definition of time resolution in differential delay chain</td>
</tr>
</tbody>
</table>
4-10 (a) Noise possibility on fast & slow signal edge (b) uncertainty on weak-constrained edge slew rates .......................................................... 51
4-11 Principle of a TDC clock gating technique ................................................ 52
4-12 Time windowed TDC prototype: the first stage for coarse detection and power reduction .......................................................... 53
4-13 (a) Purposed timing diagram of the idea of time windowed TDC (b) failure picking up of DCO_out edges in the prototype of time-windowed TDC .......................................................... 53
4-14 Timing diagram of clock gating operation .................................................. 54
4-15 Circuit diagram of the clock gating technique ............................................ 55
4-16 Meta-stability condition in the clock gating circuit .................................... 56

5-1 The layout flow ...................................................................................... 60
5-2 Layout topologies constrains (a) device matching (b) device symmetry (c) device proximity ........................................................................... 61
5-3 (a) Layout of OE buffer in DTC ................................................................ 62
5-4 (b) DTC selection set layout in Proximity constrain .................................. 62
5-5 The layout of Unit cell of DTC ................................................................. 63
5-6 The layout for the entire DTC ................................................................. 63
5-7 The proximity topology and diffusion sharing in unit cell of DTC ............ 63
5-8 The extraction results of the parasitic capacitance in OE buffer ............... 64
5-9 Layout of the TDC core unit cell ............................................................. 65
5-10 Symmetric layout of a sense-amplifier based D-flip-flop ....................... 66
5-11 Test plan of 8-stage delay element string transient simulation ................ 67
5-12 Histogram of Monte Carlo simulation of 8-stage DTC delay chain Ck2D7_r: the propagation time from the rising edge of CLK to the falling edge of the output node: mean propagation time is 241.639 ps, standard deviation is 6.32 ps with 1500 samples .......................................................... 68
5-13 Histogram of Monte Carlo simulation of 8-stage DTC delay chain Ck2D7_f: the propagation time from the rising edge of CLK to the falling edge of the output node: mean propagation time is 150.582 ps, standard deviation is 10.997 ps with 1500 samples .......................................................... 68
5-14 Histogram of Monte Carlo simulation of 8-stage DTC delay chain D5D6_f: the propagation time from the falling edge of D5 output to the falling edge of D6 output: mean propagation time is 19.4115 ps, standard deviation is 53.074 fs with 1500 samples .......................................................... 69
5-15 Histogram of Monte Carlo simulation of 8-stage DTC delay chain D4D5_r: the propagation time from the rising edge of D4 output to the rising edge of D5 output: mean propagation time is 15.7081 ns, standard deviation is 15.6182 ns with 1500 samples .......................................................... 69
5-16 Transfer function of DTC with post-layout extraction ......................... 70
5-17 Linearity performance of DTC with post-layout extraction ................... 70
5-18 Transfer function of TDC with post-layout extraction ......................... 71
5-19 Histogram of transition operation of TDC with post-layout extraction .... 71
5-20 Linearity performance of TDC with post-layout extraction ................... 72
5-21 The floor plan of all-digital PLL ............................................................ 73
5-22 Dedicated Power supply pads assignments for different blocks of ADPLL
5-23 The decoupling circuit in the path of power supply line
5-24 Measurement floor plan: (a) DTC, (b) TDC
5-25 Flexible system setting for remote controlling of the testing chip
5-26 Output spectrum of ADPLL
5-27 Measured fractional Spurs over Bluetooth Smart Channel
5-28 Spectrum of the ADPLL output
5-29 Setup to facilitate the measure
5-30 Setup for DTC measurement
5-31 Ideal waveforms of the reference signal FREF and the delayed signal FREFDLY when manually setting the selection code
5-32 Transfer function of DTC: the output signal duty cycle vs the input bins
5-33 Linearity of DTC based on the measured variance of duty cycle
5-34 Transfer function of DTC: phase departure vs input bins
5-35 Linearity of DTC based on the measured variance of phase departure
5-36 The on-chip circuits for TDC measurement
5-37 Setup for TDC open loop measurement
5-38 Phase rotation relationship diagram of the two input signals $F_{REF\text{in}}$ and $EXT_{CKV}$
5-39 Measured waveform of TDC outputs over the whole range of phase rotation
5-40 Transfer function of TDC
5-41 Power distribution of Ultra low powe ADPLL
# List of Tables

<table>
<thead>
<tr>
<th>Table</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-1</td>
<td>Modulation scheme at 2400 – 2483.5 MHz frequency band</td>
<td>2</td>
</tr>
<tr>
<td>1-2</td>
<td>CMOS 40nm process technology parameters</td>
<td>3</td>
</tr>
<tr>
<td>1-3</td>
<td>System specifications of ultra-low power ADPLL</td>
<td>6</td>
</tr>
<tr>
<td>3-1</td>
<td>Tri-state of PMOS-NMOS pair</td>
<td>27</td>
</tr>
<tr>
<td>3-2</td>
<td>Schedule of selection set</td>
<td>28</td>
</tr>
<tr>
<td>3-3</td>
<td>Configuration of DTC gain calibration parameters</td>
<td>39</td>
</tr>
<tr>
<td>4-1</td>
<td>Design goal of TDC</td>
<td>47</td>
</tr>
<tr>
<td>5-1</td>
<td>The annotation of measured variables in the transition operation</td>
<td>67</td>
</tr>
<tr>
<td>5-2</td>
<td>The phase noise at the TDC delay chain output node</td>
<td>72</td>
</tr>
<tr>
<td>5-3</td>
<td>Simulated current dissipation of the post-layout extraction circuit: DTC and TDC</td>
<td>84</td>
</tr>
<tr>
<td>5-4</td>
<td>Simulated current dissipation of the post-layout extraction circuit: clock buffer and clock gating</td>
<td>84</td>
</tr>
</tbody>
</table>
Acknowledgments

By this opportunity, I would like to express my gratitude and appreciation to everyone who has helped to make this thesis possible.

Sincere appreciation first goes to my supervisor prof. Dr. R.B. Staszewski for his guidance and support. Prof. Staszewski is an expert in the field of digital RF. He constructive criticisms definitely leads to such a high quality work of two-student project. His work attitude will teach me and encourage me for the future work. I would like to thank Dr. Yao-Hong Liu, who was my daily coach at imec-holst center, for his inspiring and in-depth discussion over this project.

I would like to thank: dr.ir. P.J.A. (Pieter) Harpe, who responded to my questions whenever I got stuck. I do appreciate his patience and his hand-on instruction to guide me through the whole work; dr.ir. J.H.C. van den Heuvel, who spent a lot of time sitting together to discuss the problems with me, and helped me to dig into the intrinsic characteristic of problems; my friends ir. Gao Hao, ir. Liu Bo, who shared their experience and knowledge as a PhD with me and helped me out of every trouble; ir. Zhao Duan, for the technique problem discussions and the happy lunch hours at HTC strip; Jiajia Liu far away in Switzerland, for your embracing every different ‘me’: frustrated, sometimes even crazy.

My internship in imec-holst center would not have been so amazing without the presence of colleagues: Huang Xiongchuan, Ding Ming, Ba Ao. I wish them all the best. And I will never forget my classmates back in Delft: Yan yuxin, Xu Yuanxing, He Jingchu, Wang Guofeng, Yao Qiang, Shi Xingyuan and Chen Hongzuo. Thanks for companions in my first year in the Netherlands. Last, I will not progress my project without the cooperation with Vamshi Krishna.

I am deeply indebted to my parents, for their constant love, support, and patience. I am really lucky to be their daughter. This work owes to my grandmother, the most adorable lady I believe who raised me up. Finally, I owe gratitude to all of the friends who are always there for me. The friendship will last forever in my heart.

Delft, University of Technology
December 20, 2013

Master of Science Thesis

Bindi Wang
Chapter 1

Introduction

1-1 Background

1-1-1 Wireless Personal Area Network and IEEE 802.15.4 Technology

IEEE standard 802.15.4 provides lower network layers of a type of Wireless Personal Area Network (WPAN) [5], which is widely-known as Bluetooth Smart and Zigbee wireless techniques when fully considering of the entire layers. WPAN is designed to be small-scale, short-range, low-rate and low-cost, suitable for applications in the medical, auto and rehabilitation fields. This determines some stringent requirements on the integration circuit design. Additionally, for portable or implantable body sensor node, the autonomy is badly limited by the power consumption of radio parts, up to 80% of the sensor node’s total power budget. Hence, ultra-low power applications are more attractive. Also the consideration of system integration and future production calls for the low cost solution.

There are three possible unlicensed frequency bands for a WPAN operates:

- 868.0 – 868.6 MHz: Europe
- 902 – 928 MHz: North America
- 2400 – 2483.5 MHz: worldwide use

The WPAN applications can operate on different modulation schemes and data rate in different frequency bands. Taking DC power consumption into consideration, lower RF frequency band is preferred. However, high data rate modulation scheme enables low duty-cycle operation helpful to lead to better energy efficiency. There is trade-off between DC power consumption and energy efficiency. The design of this project targets on the frequency band of 2400 – 2483.5 MHz, suitable to support transmission/reception rate when extending PLL to be a modulator or part of a transceiver. The brief introduction of the data modulation scheme is presented in the following table Table 1-1. Further detailed system specifications for ultra-low power ADPLL part will be presented in the Section 1-2.
Table 1-1: Modulation scheme at 2400 – 2483.5 MHz frequency band

<table>
<thead>
<tr>
<th>PHY (MHz)</th>
<th>Frequency band (MHz)</th>
<th>Chip rate (Kchip/s)</th>
<th>Modulation</th>
<th>Bit rate (kb/s)</th>
<th>Symbol rate (Ksymbol/s)</th>
<th>Symbols</th>
</tr>
</thead>
<tbody>
<tr>
<td>2450</td>
<td>2400 – 2483.5</td>
<td>2000</td>
<td>O-QPSK</td>
<td>250</td>
<td>62.5</td>
<td>16-ary Orthogonal</td>
</tr>
</tbody>
</table>

1-1-2 The state of Art Nanoscale CMOS 40nm Technology

Since the first transistor developed at Bell laboratories in 1947 and the first CMOS logic gate in 1963, the evolution of CMOS technology has been following Moore’s law, predicting that as a result of continuous miniaturization, transistor count would double every 18 months. Nowadays silicon IC technologies can be primarily classified under three types: Bipolar, MOS, BiCMOS (Bipolar CMOS technology). Integration circuits built by Bipolar first hit the product market and MOS technology later.

From the analog viewer, comparing with BJT technology, MOS has disadvantages related to the smaller cutoff frequency, more flicker noise and smaller small signal output resistance for short channel, that is, the degradation of gain, while MOS technology intrinsically does good in switching implementation and capacitor implementation. A similar comparison can be derived from a digital viewpoint and comes up on the side of CMOS. It is nonetheless prominent in lower-power, higher-volume circuit applications. At the same time, continuous scaling of CMOS technology has reached a state of evolution in terms of both frequency and noise, where it becomes a serious contender for radio frequency (RF) applications in the GHz range, i.e. cutoff frequency of in order of 100s GHz have been reported in 40nm CMOS technology, and the supply voltage is keeping shrinking while the threshold voltage is not scaled down. The trend of transistor scaling in CMOS technology indicates the powerful growth of digital circuit applications and the system integration for systematic functionality, though the analog figure-of-merit (FoM) of transistors deteriorates, like the dynamic range (DR), signal-to-noise ratio (SNR), and linearity. To leverage on the benefits of technology scaling and take care of the degradation of analog performances, novel circuit design and signal processing strategies come to the era of radio system-on-chip (SoC) design.

The characteristic size of transistor keeps shrink, yet, improves the chip density, for instance, in 45nm technology, it is twice over that in 65nm and four times over that in 90nm. This supports multi-standard convergence and multimedia in hand mobile and wearable applications with few area cost compared with previous technology.

The design of ultra-low power ADPLL for IEEE 802.15.4 standard is carried out in TSMC CMOS 40nm low power technology. This technology provides three classifications of transistors of high-threshold-voltage $V_{TH}$ (HVT), normal-threshold-voltage $V_{TH}$ (NVT) and low-threshold-voltage $V_{TH}$ (LVT) which can be placed on the same die to achieve both low leakage and high performance requirements. More data of process technology parameters is presented in the table Table 1-2. To achieve a solution of low cost, low power, the design works at a 0.9V supply voltage with the nominal voltage 1.2V.
1-2 Motivation and Objectives

The blooming family of global wireless communication standard includes not only WPAN, but also WLAN, WMAN, WWAN, etc. Furthermore, there are plethora of standards, specifications and architectures, which leads the trend of multi-standards convergence. For sake of economy, low power, low cost and low area is another hot research track, inspiring the development of single reconfigurable hardware.

From perspective of technology and engineering, deep sub-micron CMOS integration with intensive digital approaches is primary. Especially for ultra-low radio design, RF processor with intensive digital approaches brings it into reality. Thanks to power digital technique, great flexibility is achievable in reconfigurable radio, meanwhile, the complexity of analog design is relieved.

Ultra-low power radio group in imec-holst centre is looking into techniques that performs a great role in the next new generation of sensor wireless network as an attractive alternative to commercial equivalents. The implementation of wireless sensors for Personal Area Network (PAN) or Body Area Networks (BAN) involves several parts of circuits, like power source and management, radio blocks, micro-Controller, sensor itself, and other analog/digital interface circuits. Literature reports that up to 20% of DC power consumption in radio applications is occupied by PLL [6], which in general plays a significant role in frequency synthesis, signal modulator, etc for the applications of radio, telecommunication and so on; for example, modern architectures of transceivers are investigated based on PLL [7, 4].

The design of analog PLL, especially fractional-N charge-pump based PLL, as see in Figure 1-1 has been a quite mutual topic in research for years but it comes up with challenges with process-scaling-down. It calls for intensive redesign efforts and leads poor analog performance compromise, as will be clear in the following. At the same time, digital RF solutions, where the idea of all digital PLL (ADPLL) is proven as succeed, has received increasing attention in academic fields due to its high flexibility, high programmable capability and high integration level, [8, 9, 10, 11, 12]. The basic system structure of ADPLL is shown in the figure Figure 1-2, where the digital circuit blocks of time-to-digital converter (TDC), digital loop filter and digital-controlled oscillator (DCO) replace phase frequency detector (PFD), passive loop filter and voltage-controlled oscillator in analog PLL, respectively.

Digital PLLs solve most of the issues in analog PLL [13, 14, 15, 16, 17]:

1. Eliminate the in-band noise contributions of phase-frequency detector and charge-pump

2. Adopt digital loop filter as alternative to passive loop filter which is compact and technology scale
Figure 1-1: The conventional architecture of fractional-N charge pump PLL

Figure 1-2: Principle of digital PLL
3. Cancel out $\Sigma \Delta$ modulator quantization noise in digital domain comparing with fractional-N charge-pump PLLs

4. Built-in self-test improves the testing cost and accessibility on more densely packed SoC

However, state of art ADPLL suffers from the power-efficiency issue, as see in the plot of the FoM, which is based on the power dissipation and the jitter variance of PLLs. FoM of PLLs is quantified as

$$FoM = 10 \log_{10}(\sigma^2_{jitter} \times \frac{P}{\text{mW}})$$

Where $\sigma_{jitter}$ is the timing error variance called jitter; $P$ is the power consumption in unit of mW. The fact of heavy power consumption stops digital PLL from being applicable for ultra-low power communication application, like Bluetooth 4.0 and Smart Zigbee. The thesis project intends to achieve one sub-millWatt PLL, being a good candidate for ultra-low power RF products. Especially, efforts need to be taken to investigate the solution of low power TDC mechanism, one competitive advantage over state of art digital PLL as well as fractional-N charge-pump PLL. The system requirements of ultra-low power ADPLL is stated in the Table 1-3

System parameters related to TDC are the frequency range, switching time, in-band phase noise, reference frequency, and power consumption.
**Table 1-3:** System specifications of ultra-low power ADPLL

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency range</td>
<td>$2.4 - 2.7 \text{ GHz}$</td>
</tr>
<tr>
<td>Frequency accuracy</td>
<td>$60 \text{ KHz}$</td>
</tr>
<tr>
<td>Switching time</td>
<td>$40\mu\text{s}$</td>
</tr>
<tr>
<td>Phase noise @ 1 MHz offset</td>
<td>$-110 \text{dBc/Hz}$</td>
</tr>
<tr>
<td>Integrated RMS phase error</td>
<td>20</td>
</tr>
<tr>
<td>Modulation data rate</td>
<td>$1\text{Mbps}$</td>
</tr>
<tr>
<td>Reference frequency</td>
<td>$32\text{MHz}$</td>
</tr>
<tr>
<td>Power consumption</td>
<td>$&lt; 1\text{mW}$</td>
</tr>
<tr>
<td>Technology</td>
<td>40nm CMOS</td>
</tr>
</tbody>
</table>

### 1-3 Main Contributions of This Work

This design is part of one two-master-student thesis project, that accomplishes the first-ever multi GHz sub-millWatt fractional N ADPLL, and the every ADPLL accommodates the requirements of Zigbee HS-OQPSK and Bluetooth Smart GFSK modulation. Among them, the work discussed in this thesis focuses on the circuit level design of the low frequency path within digital PLL, which is crucial to the functionality of the entire system and highlights the low power techniques of this ultra-low power ADPLL. The major contributions of the work including:

- DTC based on phase prediction algorithm of ADPLL is implemented. Although the idea of delay cells self-loaded as the basic component for time domain circuits is obviously, it is believed that custom design cells for purpose of power saving is under ignorance. The work exploits DTC from the aspect of transistor size to the very backend of layout work. Not only the functionality of DTC but also the fine time resolution and acceptable linearity under system requirements are achieved.

- Clock gating technique is adopted in the design of TDC, that significantly reduces the power consumption of TDC. Plus, this meta-stability-free gating topology only utilizes digital standard cells so that it releases the design complexity and easily updates for system modifications. Process corners checked makes sure the validation of the clock gating technique.

- Just-in-time DTC gain calibration is proposed in the work. It is fully arithmetic operation without any analog components. This programmable and flexible algorithm aims at compensating process, voltage and temperature (PVT) variation in real circuits. PVT variation relates to not only the semiconductor manufacture but also the circuit itself design, which could not be totally avoided; more, it is possible to cause the significant changes of digital circuit performance, even leads to the failure of the entire system.

- Little attention has been given in the field of digital circuit layout. This work makes an effort to clarify the importance of custom layout for guaranteeing the high performance of digital circuits, especially in deep CMOS technology. Fully custom design of layout for the core blocks of DTC and TDC is presented.
A good work of IC circuits not only relies on the novel idea of design but also calls for the clever testing setup to prove its functionality and performance. This work also explores the possibility to carry out different automatic measurement methods to do the testing of mixed signal circuits. Limited by the instruments in hand, some measurement is proved by the data results and some is discussed and verified its possibility.

Digital loop filter, DCO and DCO interface circuits, system verification and analysis are done by the other Msc student, Vamshi Krishna Chillara. The project is accomplished with a 860\(\mu\)W ADPLL using 40 nm CMOS technology as a frequency synthesis feasible to support Bluetooth Smart, Zigbee communication standards in a ultra-low powe transceiver.

### 1-4 Future Work

The proposed phase-prediction DTC-assisted clock gating TDC could be improved at both system and circuit level in the future work. A summary of the potential improvements is presented as

**Decoupling at the power line**  It is sufficient to utilize the self-loaded buffer/inverter as the cell of delay string in deep-micron CMOS technology for the in-band phase noise requirement. However, this prototype of circuits is sensitive to process, voltage, and temperature variation. Although the DC current consumed in this mixed-signal circuit block occupies a small proportion compared to other blocks in DCO, the transient current at the time of gate voltage transition is large so that the degradation of supply voltage will make the time resolution of delay elements fluctuating, which contributes to the fractional-spurious tones at the output of PLLs and even the non-monotonic transfer function of DTC. In that case, more attention should be paid to smooth the peak of the transient current and improve the delay cells circuit insensitive to the supply voltage.

**AC-coupled clock Buffer**  Simple inverters-cascaded are used as the clock buffer in the work. It is driven by another CMOS buffer with incredible big size. This solution should be replaced by custom designing one AC-coupled buffer, so as to provide DC bias voltage and optimize the flicker noise from the signal\((FREF)\) path.

**Calibration algorithm of DTC and TDC**  The calibration algorithm investigated in the work is for DTC alone. More effort could be put to correlate both time resolution more precisely when normalizing the fractional error for PLL loop filter. As well as the calibration aiming to matching the loop filter gain with DTC and TDC conversion is necessary. Aside from the gain of DTC or TDC gain calibration, fractional spurious tones could be eliminated by the cancellation algorithm via the idea of dithering.

**Adaptive offset prepared for TDC**  The work fixes the offset prepared for TDC via mass-corner check simulation. More smart way like adaptive adjusting the offset to get a safe detection time window for TDC regarding of the whole period of variable phase signal \(CKVD2\).
**Built-in self-testing**  The difficulty of mixed signal testing in the work is came up with. Different from the conventional mixed signal block measurement, the work works at time domain, which gets even high qualify requirment for the measurement instruments. As such, built-in self-testing is vital as to cut off the cost of measurement and collect more precise raw data immunizing against the timing issue (phase noise or jitter performance) of the external instruments.

### 1-5 Thesis Organization

This thesis is organized into six chapters. Chapter 1 presents the background of ultra-low power radio applications, from the aspect of the wireless communication standard requirements to the influence of physics implementation technique development, and states the contributions that have been made by this work, summaries future of work to improve the metrics of circuits. Chapter 2 presents the background of ADPLL, reviews the related previous art about PLL and highlights the impact of TDC to the system performance. The non-ideal issue of DTC and TDC is addressed together with the system response analyzing respectively. An digital approach for compensating the mismatch of DTC against PVT issues is introduced at the end of chapter. Chapter 3 starts to present both logical and physical level effort made to implement the critical block of DTC. In chapter 4, the design of clock gating TDC is presentd in detail. The related previous art about fine-resolution TDC techniques is also discussed. Chapter 5 focuses on the physical implement of both DTC and TDC. Detailed measurement setup is addressed as follow. Open-loop and close-loop measurement results are shown.
Chapter 2

System analysis of ADPLL

2-1 Frequency Synthesis Techniques –PLL

Phase-locked loops (PLLs) are unique to modern communications systems for their remarkable versatility, as one can find, PLLs may be used in most transceivers as a local oscillator (LO) for up-conversion or down-conversion. Also PLLs may be used to perform frequency modulation and demodulation, and to regenerate the carrier which has been suppressed. Aside from the radio application, PLLs plays a part in digital systems, for purpose of skew compensation, clock recovery and the generation of clock signals.

PLLs generate an output signal whose frequency is programmable, rational multiple function of a fixed frequency \( f_{\text{out}} = f_{\text{REF}} \times G \); \( G \) is a integer or real parameter \([1]\); the clean fixed frequency signal refers as the reference signal. Besides, the phase relationship is controllable in the steady state. The architecture of PLL is illustrated in Figure 2-1, seen that the phase detector compares the phase difference between the input reference signal with that of the VCO, the error signal after comparison is in the negative direction that guides the VCO to reduce the phase difference. In practice, PLLs performance involves phase noise, spur tones, locking time, frequency resolution, as well as the cost.

![Basic PLL architecture](image)

**Figure 2-1:** Basic PLL architecture, [1]
2-1-1 PLL Fundamentals

PLLs primarily work as a local oscillator to provide a carrier of required frequency and locked phase information. For a practical oscillator, its output signal is quantified as

\[ v(t) = A \cos(\omega_c t + \phi(t)) \]

Where \( A \) is the amplitude, \( \omega_c \) is the center frequency, and \( \phi(t) \) is a small time-varying phase indicating random deviation in the period of the signal called phase noise. Characterizing the phase noise in the frequency domain, the output spectrum is shown in Figure 2-2. The most factor cared about in radio productions is phase noise, as the tail of the LO spectrum acts as continuum frequency components which would cause reciprocal mixing as seen in Figure 2-3 and even the failure of a transceiver limited by the achievable linearity and gain of other RF blocks. Additionally, the noise relative to the carrier rather than its absolute value is concerned. Normalization of the mean-square noise voltage density to the mean-square carrier voltage yields the Eq. (2-1-1) for phase noise:

\[ L(\Delta \omega) = 10 \log_{10} \left( \frac{\text{noise power in } 1 \text{ Hz BW at } \omega_c + \Delta \omega}{\text{carrier power}} \right) \]

Phase noise is expressed in the unit of \( 'dBc/Hz' \). It specified at a frequency offset \( \Delta \omega \) from the carrier frequency \( \omega_c \). Based on the LC VCO model, the practical output spectrum of oscillators is depicted with Figure 2-4. Unlike the dash line in the diagram describing the ideal case that only white thermal noise of the tank conductance is taken into consideration, the real phase noise spectrum splits into three regions: \( \frac{1}{\Delta \omega^2}, \frac{1}{\Delta \omega}, \frac{1}{\Delta \omega^0} \), where the \( \frac{1}{\Delta \omega^2} \) region matches the trends of the dash line and reflects that the noise voltage frequency response rolls off as \( 1/f \) and the power density is proportional to the square of noise voltage [18], the \( \frac{1}{\Delta \omega^0} \) at small offsets mainly depends on the flicker noise up-converted as such the boundary of two regions \( \frac{1}{\Delta \omega^0} \) and \( \frac{1}{\Delta \omega^2} \) is exactly the \( 1/f \) corner of device noise, and the flatted-out region comes from any physical connection between the tank itself and the outside world, for example, buffers connected to the measurement instrument. Another obsession with frequency

Figure 2-2: Output spectrum of practical oscillators
Figure 2-3: Illustration of reciprocal mixing due to the tail of LO spectrum
synthesis is systematic eradication of spurs. In case that the time-varying phase $\phi(t)$ of the output signal is periodic, it gives rise to stationary sidebands—spurious responses, or just spurs, the undesired signals placed from the carrier. Spurs are mainly associated with the reference path, involving the design considerations of PFD and charge-pumping circuits in classical analog PLLs. In practice, the switches within charge-pump circuits intrinsically has issues of leakage: when leakage is low, the ripple on the control line is little as well as the modulation of the VCO; when leakage increases, in order to make compensation of the loss, an increasing static phase error is necessary. In a word, any leakage increases spur power (static phase error). The similar mechanism of spurs in time domain for digital PLL is going to discuss in the Section 2-1-2.

2-1-2 State of Art All-digital PLL (ADPLL)

Frequency synthesis is a local oscillator with tunable frequency output and high spectral purity for a transceiver. PLL is one popular technique to implement frequency synthesis, and the traditional structure implemented by analog charge-pump approach. The architecture of charge-pump PLL is proved to be valid and effective over decades. As discussed in the Chapter 1, the scaling-down technology stimulates the existing frequency synthesis technique going digital. In many mobile applications, digital PLLs replace the conventional charge-pump PLLs [8, 19, 20, 13, 10, 17].
ADPLL Frequency Response

Revisiting the principle of digital PLL shown in Figure 1-2, the output of ADPLL follows that

\[ f_{CKV} = f_{FREF} \times FCW \]

Where \( CKV \) is defined as the output of the variable oscillator, and \( FREF \) is the reference input signal. \( FCW \) is the frequency division ratio between \( CKV \) and \( FREF \). ADPLL works in the time domain: the CKV and FREF clock transition timestamps is used to calculate the phase error. The phase CKV is to count the number of CKV cycles at each transition of the variable oscillator, and the phase FREF is determined by the accumulation of FCW at each transition of the reference clock. Re-sample the phase CKV to synchronize with the phase FREF by the re-timing clock CKR, and the phase error subtracted the phase FREF from the phase CKV is fed into the digital loop to generate the oscillator control words for driving the oscillator frequency in a direction of reducing the phase error. In all, the the model of PLL in the phase domain is a negative feedback system.

Assume that the transfer function of the open loop is \( H_{ol}(s) \), the closed-loop transfer function of the output of oscillators is expressed as

\[ H_{cl} = \frac{FCW \times H_{ol}}{H_{ol} + 1} \]

The linearized model of PLL is illustrated in Figure 2-5, within the model, phase is taken as both input and output variables, and DCO is modeled as an integrator\(^1\) with the gain constant \( K_{DCO} \) merely describing what change in the output frequency results from a specified change in oscillator tuning words. The phase detector is simplified as a subtractor that generate a phase error output \( \phi_{oe} \). The loop filter \( H(s) \) is utilized to describe the condition of gain factor scaling and even additional filtering requirements. and it gives the resulting open-loop phase transfer function

\[ H_{ol} = H(s) \times \frac{f_R}{s} \]

, so that the closed-loop transfer function is derived as

\[ H_{cl} = \frac{FCW \times H_{ol}}{H_{ol} + 1} = \frac{FCW}{1 + s \frac{1}{f_R \times H(s)}} \]

\(^1\)phase is the integral of frequency
From the transfer function one can find, if the loop filter is simply a gain scalar $\alpha$, the loop transmission has a single pole and the bandwidth of the whole loop is

$$f_{BW} = \alpha f_R$$

This type of loop is well known as a type-I PLL. The advantage of type-I PLLs is that it is easy to obtain large phase margin for steady state and to implement in hardware. However, the important shortcoming is that, the bandwidth and the phase error is coupling, which is derived as

$$\frac{\phi_e}{\phi_{FREF}} = \frac{s}{1 + s \frac{1}{f_R \times H(s)}} = \frac{s \frac{1}{f_R \times \alpha}}{1 + s \frac{1}{f_R \times \alpha}}$$

The steady-error is therefore quantified as

$$\lim_{s \to 0} s \phi_e = \frac{f_{FREF}}{f_{BW}}$$

Obviously a reduction of steady-state phase error accompanies the loop bandwidth increase. However, in case of achieving the static zero phase error, frequency tuning of DCO calls for an infinite gain at DC, instead of all the frequencies. It is straightforward to employ an integrator in the loop filter to meet this characteristic. PLLs with a second pole at DC thus is defined as type-II PLL. The loop filter is implemented by digital approaches and its mathematics model is shown in Figure 2-6. Updating the transfer function of loop filter, the open-loop transfer function is derived as

$$H_{ol} = H(s) \times \frac{f_R}{s} = (\alpha + \frac{\beta f_R}{s}) \times \frac{f_R}{s}$$

and the closed-loop function magnitude response is shown as

$$H_{cl} = FCW \frac{\alpha f_{RS} + \beta f_R^2}{s^2 + \alpha f_{RS} + \beta f_R^2}$$

where the natural frequency is calculated as

$$\omega_n = \sqrt{\beta f_R}$$

and the time constant of zero in the forward path is

$$\tau_z = \frac{\alpha}{\beta f_R}$$
Figure 2-7: Loop transmission of Type-II PLL

Figure 2-8: Linear s-domain model with noise sources

as well as the crossover frequency $\omega_c$ approximately equals to the value of $\tau_z \times \omega_n^2$. The loop transmission of type-II PLL is depicted with Figure 2-7. In term of stability, the damping factor of $\tau_z \times \omega_n^2$ increases when increasing the crossover frequency with a fixed natural frequency and vice versa. Thus, it is clear that the bandwidth and stability of a type-II PLL is adjusted independent from preserving the zero-mean steady-state phase error [21].

**ADPLL Noise Source and its Impact**

In general, there are three noise source in the ADPLL: the reference phase noise from the external clock, the TDC-contributed noise and the phase noise due to DCO. Inserting the phase noise source into the linear model of ADPLL in s-domain, as shown in Figure 2-8[17].

More attention is paid to the reference signal path and the effect of practical TDC in this work. The phase noise of the oscillator itself is high-pass filtering by the loop, while the noises from the reference path follow the principle of low pass filtering. The difference between the reference phase noise and the TDC-contributed noise is the factor of FCW, shown as:

$$H_{cl,TDC} = \frac{f_R \times \alpha / s}{1 + H_{ol}} = \frac{1}{s \alpha f_R (1 + H_{ol})}$$
The block of TDC interprets the phase difference between FREF and CKV into digital words, and it works in discrete time domain. Similar with the principle of analog-digital-converter (ADC), the quantization noise of TDC is dominant and the phase noise spectrum due to the finite time resolution of TDC is quantified as

\[ L = \frac{(2\pi)^2}{12} \left( \frac{\Delta t_{\text{res}}}{T_V} \right) \frac{1}{f_R} \]

Where \( T_V \) denotes as the period of variable output of DCO, \( f_R \) is the frequency of the reference signal, and \( \Delta t_{\text{res}} \) is the quantization level of TDC. The quantization noise of TDC agrees with the Eq. (2-1-2)

**ADPLL Spurious Tones**

Fractional-N PLLs are advantageous comparing with Integer-N PLLs, for it breaks the trade-off between the loop bandwidth and the frequency resolution. However, the algorithm of fractional-N division control introduces another metrics of interest—in-band spurious tones [22]. It is determined by the periodicity of fractional-N division control algorithm. In either counter-based or divider-based ADPLL architectures [3], the fractional spurs places close to the carrier at the fractional frequency of the reference signal, as seen in the ramp waveform of phase error of ADPLLs in time domain Figure 2-9. In the diagram, the fractional part of FCW is 0.7, and in steady-state, the phase error of TDC is accumulated within the range of zero to one following the module-arithmetic. Although the phase error is mean zero in statistics, the edge of CKV is not well aligned with that of FREF. The pattern of the phase error is repeated at the frequency of 0.7\( f_R \). It indicates that the waveform of the sampled phase error is a ramp which wraps with periodicity associated with the period of \( f_{\text{Frac}} \) (\( f_{\text{Frac}} = FCW_F \times f_{\text{REF}} \), where \( FCW_F \) is the fractional part of FCW) to synthesize. This fractional error is filtered by the loop filter and modulates DCO to generate the spurious tones at its output thought the loop is locked already. In term of the finite quantization level of TDC, the quantization error of TDC is not neglected to make contributions to a spurious tone [23]. In

![Figure 2-9: Periodicity of phase error in fractional-N PLLs (FCW_F = 0.7)](image)

detailed, as for a divider-based fractional-N ADPLL, the asynchronous multi-modulus divider in the feedback path is a significant source of spurs. Also, the spurs due to the quantization of TDC is determined by the fractional frequency \( f_{\text{Frac}} \), the TDC time resolution and the loop dynamic [24]; while in a counter-based ADPLL, spurs occurs at the multiples of the fractional frequency \( f_{\text{Frac}} \) at the selected channel due to TDC quantization noise. Analyzed in the context of phase noise transfer characteristics, the fractional spurs is not attenuated
and also degrades the output spectrum owing to the low-pass nature of the loop filter, the same with the quantization noise of TDC.

Aside from the mechanism of fractional N division, the reference spur signals is another possible spurious tones at the output spectrum, possibly caused by caused by radiation, power supply noise, mixing products and the reference leakage could exist in real circuits. In general, the reference signals generated in the phase detection block modulate the DCO tuning words, leading spurious signals at the frequency of the reference signal as well as it harmonics. One straightforward solution to improve the reference spurs is to employ extra-filtering, as one could implement high order loop filter in ADPLLs or the loop gain calibration [8]. The non-linearity of TDC is conceptually similar with that of analog-to-digital converter (ADC).

2-1-3 Challenge for ultra-low power ADPLL

Inspite of the metrics discussed above, the power dissipation is another design consideration attracting more and more research effort, since TDC is one of the most power hungry blocks in a ADPLL, mainly because the dynamic range it covers is as long as the period of the variable phase signal CKV and the frequency of some circuits in a TDC is the same with CKV in order of several Gigahertz in the conventional architecture. Even when fine resolution of TDC is required, the power consumption will increase further. Notice that although the finer resolution of TDC is achievable owing to the idea of $\Sigma\Delta$ modulation. The reported publications [25] present the possible architecture of digital PLL based on $\Sigma\Delta$ modulation, however, this introduces more problems to PLLs. As know fine resolution of TDC helps to release the bandwidth limited aiming for wideband applications, TDC utilizing the idea of $\Sigma\Delta$ modulation has a long conversion time, which means the extra delay is inserted or slowing down the reference frequency. Moreover, the clock for $\Sigma\Delta$ modulation is at the frequency times of that of the reference signal, which is not obviously to reduce the power consumption. In order to take advantage of deep-micron CMOS technology, since the gate delay under modern CMOS technology is sufficient for WPAN applications, more effort is taken into improving the architecture of ADPLL, corresponding to the two causes of high power consumption, a phase-predicted DTC assisted snapshot TDC is proposed for ultra-low power ADPLL.

2-2 Phase prediction DTC assisted snapshot TDC based ADPLL

2-2-1 Algorithm of phase prediction

The idea of phase prediction aims at shortening the range of TDC, since in the conventional structure, one treats the phase difference between FREF and CKV is arbitrary and in order to detect all the possibilities, the dynamic range of TDC is designed to cover as long as the period of CKV. Revisiting the phase error illustration in Figure 2-9, once the loop is locked in the steady-state, the fractional phase error is pre-known theoretically. Associated with the nature of mean-zero of phase error in steady-state of PLLs, one can pre-combine the theoretic phase error with the reference signal FREF in order to align with the next edge of CKV and get zero fractional phase error Figure 2-10 regardless of the quantization effect of this delay.
The algorithm of phase prediction is proved in quantitative terms. The statistics mean-zero phase error is concluded as:

\[ E\left\{ \frac{t_R - t_V}{T_V} - PHR_F \right\} = 0 \]

Where the \( t_R \) and \( t_V \) is the time stamps of FREF and CKV rising edges, \( PHR_F \) is the fractional part of accumulated results of FCW, and \( T_V \) is the period of variable phase output of DCO. \( E \) is the expectation operator in probability theory. Once re-organized the equation, one can draw as followed:

\[(t_R + T_V \times (1 - PHR_F)) = t_V + T_V\]

The item in the left \((t_R + T_V \times (1 - PHR_F))\) is the delay version of the reference phase, and the phase delay is fractional of the period of CKV, and the right item is the next edge of CKV. The equation indicates that this two phases is well aligned. In general, this action is realized as to convert a certain numerous value (normally indicated in digital codes in hardware) to the phase delay respectively. And it is defined as digital-to-time converter (DTC). The quantitative value of phase delay steps is as shown:

\[ DTC_{ctrl} = 1 - PHR_F \frac{K_{DTC}}{t_V_DTC} \]

, where \( PHR_F \) is the fractional part of accumulated phase of FREF, \( K_{DTC} \) indicates the conversion gain of DTC and it equals to the ratio of time resolution of DTC and the period of CKV \( \frac{\Delta t_{DTC}}{t_V} \). In practice, the residue of phase error is not neglected because of fixed point arithmetic of digital hardware. So a TDC is employed to further compute the phase error residue, so as to reduce the in-band phase noise from DTC quantization, as seen in Figure 2-10. Compared to the conventional strategy of TDC, the combination of DTC and TDC is going to dramatically reduce the power consumption as the stages in TDC is remarkably reduced and its dynamic range is small fractional to the period of the variable phase CKV. Though DTC stills needs to cover a large range, the function of DTC is simpler than TDC as it needs no detection blocks, which works at a high frequency the same with CKV depending on the architecture of PLLs. In all, the power consumption is dropped.

2-2-2 Principle of phase-prediction ADPLL

Owing to the algoritm of phase prediction in Section 2-2-1, the entire system of ADPLL is updated as Figure 2-11 The referee signal FREF first drives DTC blocks to generate the
Figure 2-11: Architecture of phase prediction PLL
delayed one as to compute the residue fractional error compared to the variable phase. One could find that this system architecture improves the power efficiency from the following aspects:

- The variable output CKV frequency of DCO is further divided by 2 to reduce the power consumption in the feedback path of PLL consisting of the increment for integer counting of $CKVD_2$ rising edges and the TDC of fractional phase error.

- DTC is utilized to pre-delay the reference phase FREF in order to align with the next rising edge of $CKVD_2$.

- TDC proceeds from a clock gating block, which snapshot the exact CKV rising edge of interest. In that way, both input signal frequency of a TDC is in order of tens Megahertz.

Aside from the system configuration to reduce power dissipation, more effort is taken into the circuit design for purpose of high power efficiency. The following content is going to present the implementation of DTC and TDC, as well as the calibration algorithm to eliminate the impact of the quantization effect and nonlinearity of DTC.
Chapter 3

Digital-to-Time Converter (DTC)

A DTC is designed to convert a discrete value of the predicted phase shift into a specific RFEF edge delay. The accuracy of a DTC operation has a significant effect on the performance metric of PLL, especially on in-band phase noise and fractional spur tones from the reference signal path. Moreover, the design keeps an eye on the DC current consumption.

3-1 State of Art DTC

Few literature discussed DTC as a special topic, and most relevant works about DTC are discussed in papers [26, 10, 27]. Additional, the basic cell in a DTC is the delay unit; thanks to the deep-micron CMOS technology, the achievable gate propagation delay is in order of tens $\text{ps}$, and finer resolution of delay cells are implemented by computing the delay difference between the logic gates[28, 29, 30, 31], the RC model of the on-chip wiring [32], the frequency difference between two different oscillators [33, 34], time amplification [35], the difference between logic thresholds [36], pulse shrinking [37, 38], passive on-chip voltage divider [39] and random variation of the timing of the digital logic gates [40]. Moreover, the topology utilizing multi-stage interpolation is realized to provide the $\text{ps}$ level resolution [41].

[42] uses a DTC circuit mainly for FREF dithering algorithm within a FREF slicer buffer. This analog approach is based on the switched PMOS transistors of a controlled biasing current, as seen in Figure 3-1,

which is current heavy and not technology scalable. At the same time, the dynamic range of the phase shift is quite small, constrained by power assumption. Dynamic range of 150$\text{ps}$ is under the worst process and temperature condition while the time resolution of the DTC is about 4$\text{ps}$ and the tuning steps counts up to 16 in a 0.35$\mu \text{m}$ CMOS process.

[26] realized a cyclic time-domain successive approximation method that detects the phase difference between two signals. This implementation is analogue to Successive Approximation ADC, which is complicated and the conversion cycle is longer than one clock period. The
Figure 3-1: Controllable biasing current for DTC

Figure 3-2: The delay unit digitally controlled by loading MOSCAP
3-1 State of Art DTC

delay unit in this structure is a digitally controlled unit which loads MOSCAP capacitors shown in Figure 3-2, indicating that increasing the capacitive load slow down the DTC.

[10] mainly adapts a digital method to adjust the time delay. Two cascaded inverters with switched capacitive MOSCAP at each output node are used as the delay unit, and an extra buffer is inserted between adjunct stages in order to keep the same loading condition for each delay cell. The advantage of a DTC in this reference is its fine time resolution, which benefits from the Vernier line structure. The latter two ideas discussed above is based on the theory of RC network and take advantage of controllable switching capacitor in Nano-meter CMOS technology. The normal size for switched MOS capacitor is about 30pF. However, the performance of switched capacitors is degraded by two error factors mainly:

- Physics size of the switch
- Layout matching of capacitor array

Not only smaller size of capacitor itself but also the switch size is necessary, which is still a stringent design consideration in Nano-scale CMOS technology. Furthermore, as known that the nonlinearity of delay cells will introduce fractional spur at the output of PLL, in case of utilizing switched capacitor loading as the controllable delay, layout mismatch could be the main contribution of the nonlinearity. In general, the larger size of capacitors, the less impact of layout variation; to the contrary, the requirement of finer time resolution responds to smaller size of capacitors. This indicates that the resolution is going to be traded with the linearity requirement. In order to achieve better linearity, besides carefully playing with the layout, techniques like dynamic element match are necessary for each capacitor array. And in the listed references, not only one capacitor array is required, which is proportional to the control bits of DTC, thus, this approach is not only area consuming but also takes more labor work. Moreover, leakage current is inevitable in switched capacitors and compromises the power efficiency.

Owing to the deep-micron CMOS technology used in this design, the gate propagation delay is around 15ps in the minimum size with a nominal supply voltage, showing that it is sufficient for the system in-band phase noise for WPAN application. As such, the logic gate propagation delay itself is proposed as the delay cell for the design of DTC, and how to drop the power dissipation requirement is under consideration.

Also since mainstream CMOS technology scaling in modern silicon ICs favors digital circuits, the design of a DTC targets at a digital circuit, which is straightforward and easy to scale down. Another interesting reference [43] shows one possible prototype of DTC. Inspired by the PWM wave modulation, the structure based on a inverter chain is under consideration. See in Figure 3-3, Figure 3-4. two possible proposals of a DTC are brought out. Looking into the first proposal — MUX-based structure, thanks to binary-cipher control mechanism, few circuit blocks are needed compared to the second proposal; unfortunately the issue of glitch could introduce unwanted rising edge of the reference delayed signal FREFDLY. The second possible structure is more attractive because the simple circuit unit, avoiding large MUX which is required for the case of delay arrays in parallel topology, and the tri-state-inverter based structure utilizes thermometer-like ciphering. Without analog biasing circuit or circuits like voltage-controlled buffer [31], all-digital circuit cell favors background calibration. In general, a challenge of a DTC design for ADPLL in wireless radio frequency applications is
to realize ultra-low-power consumption with reasonable quantization noise and attractive area cost. Based on the tri-state inverter based prototype, this chapter demonstrates an ultra-low power DTC design to enable large dynamic range and to support background calibration against process, temperature and voltage (PVT) variations.

### 3-2 DTC Specification

Besides a robust design of DTC without any glitch, the dynamic range of DTC is taken into consideration for sake of the accuracy of fractional phase detection that is part of the algorithm of phase prediction, and the successful phase locking of ADPLL. As derived in the Section 2-2-1, the phase pre-shift needed is the fractional period of CKV during the locked status of PLLs. Consequently, a DTC needs to cover at least one period of the high frequency clock ($CKVD_2$), which is driven by the high speed divider following DCO module), that is, $\frac{1}{2CLKF} \approx 833ps$. Meanwhile, the time resolution is related to the in-band phase noise. Assuming that there is no distortion in a DTC and without the residue cancellation algorithm, the quantization conversion of DTC leads the noise floor in band as the same case in a TDC, seeing in Eq. (2-1-2), as such the time resolution $\Delta_{DTC}$ of a DTC targeting at 30ps is proposed.
3-3 DTC Architecture

3-3-1 Principle of DTC

The basic idea of DTC is a group of self-loaded inverters or buffers in chain to achieve the programmable phase delay requirement, see Figure 3-5. Similar with the performance metric of DACs—one traditional circuit of mixed signal processing, DTC has two input signals: one is the input reference clock (FREF) with a stable frequency and an accurate phase, and the other one is a digital data bus (DTC\text{ctrl}) as selection bits for programming. Plus, a DTC has only one output signal (FREF\text{dly}) which delivers the delayed phase results of phase prediction algorithm. 64 stages of unit cells have been merged as a single path of the delay chain with multi-injection nodes for the reference clock signal FREF. Depending on the results of prediction blocks, the quantization level of DTC or the time resolution of DTC is implemented as the propagation delay of a buffer, as buffers eliminate the drawbacks of even-odd mismatches for selection bins compared with inverters. Obviously, each delay buffer is driven by the previous delay cells or the reference clock interface circuit, which is determined by the switch (EN\text{i}). Assume that the sum of n stages phase delay is required, the reference clock FREF will go through the delay chain starting at the (64–n)th node. The mathematics description of a DTC is as following:

\[ R_{FREF_{dly}} = R_{FREF} + \Delta_{DTC} \times (63 - n), n \in [0, 63] \]

Where n is the index of the injected node of the reference clock signal FREF, and the timing diagram of a DTC, see in Figure 3-6. The mechanism of how to switch on/off the delay cells is in particular interesting and as constrained by power consumption, the size of each circuit cell is carefully dealt with.

Master of Science Thesis

Bindi Wang
According to the brief introduction of DTC structure in Section 3-3-1, the delay unit of a DTC has two states: by-pass of the reference clock signal $F_{REF}$ and switch-off of it, and the unit is composed of two sub-circuits, they are, a selection set and an output-enable (OE) buffer, illustrated in Figure 3-7.

**Clock Buffer**

Clocked buffer is an on-chip buffer to translate the sinusoidal crystal oscillator into a phase-certain square wave signal as the reference clock signal $F_{REF}$. As literature presented, the phase noise of the whole system of PLL has mainly two contributions; one is from the reference signal path, and the other from the high frequency path CKV. Especially in near-N PLL mode, the in-band phase noise of PLL is dominated by the noise of the input reference buffer [10]. The phase noise contribution from the input reference buffer is mainly the flicker noise of the very first stage, thus, large length of transistors should be utilized. Moreover, the driving ability of the clock buffer takes effect the jitter performance. The 4—stage cascaded inverters are implemented as a clock buffer, as illustrated in Figure 3-8.

**Selection Set**

Selection set consists of a pair of PMOS and NMOS transistors and a transmission-gate-based logic AND for purpose of power saving. The inputs of PMOS-NMOS pair, shown in Figure 3-
9, have 4 states, listed in Table 3-1: This table shows that only 3 possibilities could be used while the situation $V_G(\text{PMOS}) = L$ and $V_G(\text{NMOS}) = H$ should be avoided. If the input voltages of PMOS $V_G(\text{PMOS})$ and NMOS $V_G(\text{NMOS})$ are high or low, the control set acts as an inverter to reverse the input signal at the gate and the output signal D turns low or high respectively. This is adopted as the reference clock feeding path. If the input condition is $V_G(\text{PMOS}) = H$ and $V_G(\text{NMOS}) = L$, which means both PMOS and NMOS transistors are switched off, so the output port D (the drains of both transistors) is floating and determined by signals from other paths. In this case, the reference clock is switched out.

The selection set consists of a complementary pair of transistors, which is designed for statistic DC current saving and the consideration of signal robustness. The neat structure of DTC is good for power/area cost. At the rising edge of the reference clock $F_{\text{REF}}$, every effective

<table>
<thead>
<tr>
<th>$V_G$ (PMOS)</th>
<th>$V_G$ (NMOS)</th>
<th>State of the drains</th>
</tr>
</thead>
<tbody>
<tr>
<td>$H$</td>
<td>$L$</td>
<td>$Z$</td>
</tr>
<tr>
<td>$H$</td>
<td>$H$</td>
<td>$L$</td>
</tr>
<tr>
<td>$L$</td>
<td>$H$</td>
<td>$X$</td>
</tr>
<tr>
<td>$L$</td>
<td>$L$</td>
<td>$H$</td>
</tr>
</tbody>
</table>
command of phase shift request makes all the internal nodes of this single delay chain turn effective, (either transferring from high to low or from low to high for the enable stages, or being frozen for the disable stages), denoted as active. At the falling edge of FREF, the voltages reset of all internal nodes is carried out successively. Therefore, two actions of DTC are defined in the period of each reference clock signal FREF, illustrated in Figure 3-10.

Seeing from Figure 3-10, the rising edge of FREF, with the selection data bus ready in advance, is defined as the active action of DTC; one certain selection set is selected to inject the reference clock, named as ‘enable’ mode, while others are called as ‘disable’. As such, both ‘enable’ and ‘disable’ modes are synchronized to the clock FREF but the actions of ‘active’ and ‘reset’ is being brought out successively. All outputs D of selection sets need to be ready for the next period of actions. According to the behavior requirements, the schedule of the selection set is managed as Table 3-2. When the DTC is in the active action, the PMOS transistor is switched-off while the NMOS transistor switches on to pass the desired signal that adds to the gate of the NMOS, hence, the effective drain voltage is low. When resetting, the PMOS switches on to pull up the drain voltage. The switch-on gate voltage of a NMOS is high, so that a transmission-gate based AND logic (see in Figure 3-11) is used and precedes the gate of NMOS transistor.

### OE(output-enable) Buffer

The self-loaded delay unit of DTC is an output-enable buffer, which can be powered down if unneeded, illustrated in Figure 3-12. The buffer is clocked by an enable signal EB in order to cut off the preceding path of delay stages; moreover, the enable signal EB is different from that in a selection set. The enable signal is the result of deciphering predicted phase delay, clocked at the frequency of the reference signal FREF. In general, the design metrics of OE buffers is a function of the transistor sizes of PMOS or NMOS. The dynamic power of digital circuits is positive proportional to the size of parasitic or loading capacitance at the output node, i.e. the drain capacitors at the output of OE buffers. In term of the propagation time, it is inverse proportional to the capacitor sizes. The propagation time for the active action of
Figure 3-11: Circuit implementation of DTC selection set

Figure 3-12: Schematic of output-enable(OE) buffer
DTC is critical and that for the reset action is not so important. So in order to achieve the desired delay constrained by power, active-state-based custom design of the PMOS-NMOS width ratio $\frac{W_{PMOS}}{W_{NMOS}}$ is presented, where the value of the ratio in OE inverters does not keep constant or stick to the practical value 2.5 or 3. Figure 3-13 demonstrate the idea of active-based custom design: the strong transistors for the active action and weak ones for the reset action. Since the reference clock $F_{REF}$ is injected via a NMOS switch and the voltage level of I is low, the PMOSs $P_{2L}$ and $P_{1L}$ and the NMOSs $N_{1R}$ and $N_{2R}$ is critical for the fast delay, while the PMOSs $P_{2R}$ and $P_{1R}$ and the NMOSs $N_{1L}$ and $N_{2L}$ is critical for the power saving. Additional, even in weak transistor group, the size of clocked transistors ($N_{2L}, P_{2R}$) should be at least twice over that of transistors ($N_{1L}, P_{1R}$).

**Optimization of DTC Unit Cells**

The above discussion presents the principle of each sub-block of a DTC unit cell, and its design hints for sake of timing and power dissipation. Essentially the unit cells DTC do not response the toggling clock $F_{REF}$ instantaneously. The single-path multi-injection-point structure has a problem that every transaction of the output per unit could cause conflicts at the input/output of the succeeding stages for a few while. Accordingly, it swallows the time resolution of the delay unit and consumes more standby current. One can observe it from the simulation curve in Figure 3-14: at the negative edge of REF($/clk$), each stage input does not switch to high voltage level instantaneously, but follows the preceding stage and then turns high one by one, seeing from the flat region in the plot. Looking into the DTC circuit, one possible conflict occurs at the active action of DTC. According to the topology of enable signals in the reference of pulsewidth modulation [43], the selection code is in the format of thermometer codes, that is to say, the enable signals in the positive direction of the signal path from the very beginning stage to the injected point $k$ are all enabled, so there exist $k$ paths of signal flows to drive the required injected input node that undergoes a $1 \rightarrow 0$ transition, illustrated as Figure 3-15. Another conflict is also located at each possible
Figure 3-14: Example: the race condition for the reset action of DTC: each reset transient of the selection set is forced to be delayed because of conflicts from the previous stages.

Figure 3-15: Potential conflicts because of the thermometer coding of the selection set at the rising edge of clk.
input node of the delay unit where the falling edge of $FREF$ needs to be fed via PMOS transistors in the selection set during the reset action. All the input nodes are supposed to go simultaneously from 0 to 1. But in practice, two separate signal flows will go to the input of the delay cell: one is driven by the switch-on PMOS of the selection set, whose gate voltage level turns low so that the drain voltage pulls up, the other is driven by the preceding delay cells, which propagates the $0 \rightarrow 1$ transition as well. The inputs of the previous stages is to turn high at the falling edge of $clk$, the enable NMOSs gate inputs stay unchanged and it takes some propagation time $\tau$, as such, the input of the observed stage will stay frozen for the duration of $\tau$, as depicted with Figure 3-16. Not only the degradation of the timing resolution occurs, but also the extra-straight path from $VDD$ to $VSS$ exits during the succeeding reset action, illustrated in Figure 3-17. In term of the term $P_{\text{short}}$, it is the power consumed during gate voltage transient time, that in CMOS technology is only related to the direct path short circuit current ($I_{sc}$) which flows when both the NMOS and PMOS transistors are simultaneously active, conducting current directly from supply $Vdd$ to ground or $Vss$. Therefore, the conflict will lead a large instantaneous peak current at the transition of $clk$, adding up the average dissipation current and requiring a bigger size of decoupling capacitors against the ripple of supply voltage.

Cutting off the preceding access and speeding up the reset action can solve the race conditions mentioned before. Instead of binary codes or thermometer codes, different decoding is employed. The solution of the first race condition is straightforward by voting the injected point only. That of the second race condition is to isolate all the stages for the reset action,
as depicted with Figure 3-18. The explanation of Figure 3-18 is that, taking the first stage as an example, considering the right branch of the OE buffer, the drain voltage or the output of each stage for the active action is low, and the voltage of the internal node of the OE buffer is high, as a result, when $FREF$ turns low, the NMOS $N2_R$ will switch off and the output node $D_{<1>}$ of this OE buffer will be only determined by the switching of the PMOS $P_C$ in the next stage control set and the transition of the internal node will not take effect on the NMOS $N1_R$.

3-4 Just-In Time DTC Gain Calibration

3-4-1 Least Squared Mean Algorithm

The principle of the phase prediction ADPLL is described in the previous Section 2-2-2. Revisiting the model of the phase prediction algorithm, the CKV phase predicted when loop locked follows the equation:

$$DTC_{ctrl} = \frac{(1 - PHRF)}{K_{DTC}}$$

Where $DTC_{ctrl}$ is denoted as the control words of phase delay for DTC, $PHRF$ is the fractional part of the accumulated results of $FCW$, $K_{DTC}$ is defined as the normalized DTC gain.

Assume that the calibration works at the steady state of PLL, that is, the prediction is well cooperating with phase error detection. In general, PLL in type II configuration has a characteristic of statistic zero-mean phase error or time error $PHE$ between the variable oscillator signal and a reference signal. In this way, any deviation caused by inaccurate DTC gain can be observed via the output $PHE$ from the phase detection operation, and be estimated for the DTC gain calibration in turn.

The mathematics modeling of the phase detection is

$$\phi_E[k] = \theta_R[k] - \theta_V[k]$$

Where $\theta_R[k]$ and $\theta_V[k]$ is the reference signal phase and the variable signal phase respectively. $\theta_R[k]$ results from the accumulator of $FCW$, while $\theta_V[k]$ is augmented by the variable phase
signal. Limited by the fixed-point digital circuit computing, the phase detection contains quantization noise. As such, the phase error detection is defined as [3]:

$$\phi_E[k] = (R_{R,I}[k] - R_V[k]) + (R_{R,F}[k] + \varepsilon[k])$$

The equation presents the estimation results: the integer part of the estimated reference signal phase $R_{R,I}$, the fractional part of the estimated reference signal $R_{R,F}$, the estimated variable signal phase $R_V$. The principle of phase detection together with the phase prediction is revealed in Figure 3-19. To simplify the problem, in the ideal case, which the quantization noise of TDC is neglected and the residue of DTC and TDC conversion is incredibly canceled, one can draw the relationship of the inaccurate of DTC gain parameter and the error $KE$; the error $KE$ caused by inaccurate DTC gain is related to the phase error $PHE$.

$$KE = \Delta_{DTC} \times (D - D')$$
$$D = (1 - PHRF) \times \frac{1}{K_{DTC}}$$
$$D' = (1 - PHRF) \times (\frac{1}{K_{DTC}})'$$

Where $\Delta_{DTC}$ denotes the time resolution of the DTC, D is short for the DTC control codes, $\frac{1}{K_{DTC}}$ is the multiplicative inverse of the practical DTC gain, and $(\frac{1}{K_{DTC}})'$ respectively for the incorrect DTC gain. Considering the implementation in hardware, in order to eliminate the operation of division for digital synthesized circuits, the reciprocal of the DTC gain is used to discuss in the following content and in the hardware implementation. From the aspect of the DTC estimation, the error of the reciprocal of the DTC gain is described in the equation:

$$e = \frac{KE}{\Delta_{DTC}} \times (\frac{1}{1 - PHRF})$$

The phase noise requirement of ADPLL can be achieved in case the mean-squared value of the error $e$ is minimized using the least mean squared (LMS) algorithm.

The mean-squared error (MSE), $E(e^2)$, is the average value of $e^2$; the minimum MSE value exits when the deviation of the MSE with respect to the parameter—the reciprocal of the DTC gain is zero. The condition is illustrated in the equation:

$$\frac{\partial E(e^2)}{\partial(\frac{1}{K_{DTC}})} = E(2e \frac{\partial e}{\partial(\frac{1}{K_{DTC}})}) = E(2e \frac{\partial D}{\partial(\frac{1}{K_{DTC}})}) = 2E(e \times (1 - PHRF)) = 0$$
The equation shows the condition to find the minimum MSE is to find that the correlation between the error caused by the inaccurate of the DTC gain and the variable value \( PHRF \) related to the frequency control word (FCW). The diagram of the error e estimation block is depicted with Figure 3-20.

The diagram tells that, the sign of the error e is adopted instead of the error itself so as to simplify the hardware implementation. The negative feedback is within the accumulator to drive the average error e to be zero, eliminating the correlation between the parameter \((1 - PHRF)\) and the error e. In practice, the phase error \( PHE \) is the raw output of the phase detection and it is hard to split the error caused by the inaccurate DTC gain. As such, the algorithm in the design is implemented to realize that the part of PHE related to the parameter \((1 - PHRF)\) is zero. This also proves in Widrow-Hoff LMS algorithm, in general, one system is described as a set of linear transfer function:

\[
y(k) = \sum_{i=1}^{M} W_i x(k - i)
\]

Where \( W_i \) is the ith system coefficient, \( x \) is the input signal or vector, and \( y \) is the output of the filter. Widrow-Hoff LMS algorithm tells that iteratively update the system parameters by the adaptive filter with the step size of \( \mu \) and the equation for one coefficient is shown:

\[
W(k + 1) = W(k) + 2\mu \epsilon(k)x(k)
\]

### 3-4-2 Principle of DTC Gain Estimation

Come back to the characteristic of all-digital PLL, both the variable phase increment and the reference phase accumulation exhibits a sawtooth waveform approximately especially when the fraction part of FCW approaches to zero or one. Similarly, determined by the fractional value of frequency control word FCW, the error by inaccurate DTC gain is a sawtooth waveform at a frequency of

\[
f_{PE} = f_R \times \min(FCW_\_F, 1 - FCW_\_F)
\]

, while the CKV phase is at the frequency \( f_R \) of the reference signal clock. Thus, the relationship between the predicted CKV phase (or the delayed FREF phase) and the ideal variable phase CKV is depicted as the figure, for example, when the fractional part \( FCWF \) of FCW is 0.5.

In this diagram Figure 3-21, Figure 3-22, \( T_R \) and \( T_{PE} \) denotes as the periods of the reference signal and the practical predicted phase error, respectively. One can observe that the correlation between the practical predicted phase curve and the ideal curve is periodic and the
Figure 3-21: The phase departure from the ideal CKV phase increment in the conditions of: (1) overestimation of DTC gain

Figure 3-22: The phase departure from the ideal CKV phase increment in the conditions of: (2) underestimation of DTC gain
duration equals to $T_{RCW_F}$, it responds to the period of the accumulated value $PHR_F$ of $FCW_F$ between 0 and 1. The insight correlation of the phase error and the parameter $PHR_F$ is further developed and depicted with Figure 3-23, and Figure 3-23

### 3-4-3 Implementation Of DTC Gain Estimation

Since the structure of LMS algorithm and the error with respect to DTC gain is given above, one can easily draw the system diagram of DTC gain estimation, see in Figure 3-25. In this way, any deviation caused by inaccurate DTC gain can be estimated and the system coefficient —DTC gain $K_{DTC}$ is iteratively updated. Unlike the general mathematics calculation, fixed-point and finite number of digits for hardware implementation is in nature constrained by power, area and speed. Hence, the coefficients in the algorithm are in the format of power of 2, so that the operation of right or left bit-shifting is alternative to division or multiplication.
The product of the system parameter \( PHR_F \) and the output of the phase detector \( PHE \) correlated to the inaccurate DTC gain is generated. Only the sign bit of \( PHE \) is utilized as to release the hardware cost. And literature proves that it doesn’t affect the steady-state parameter value \([\text{Honig Adaptive filter 1984}]\). This part drives one IIR filter, which is optional, and the transfer function of it is:

\[
y[k] = x[k] \times (-2^{-b}) + y[k - 1] \times (1 - 2^{a})
\]

This IIR filter works similar with the loop filter in the system of ADPLL, which is constructed as a combination of FIR and IIR filters normally. Avoiding of the complexity of the hardware, the design adopts the first order IIR filter with unconditional stability and strong filtering capability. Actually, the challenge of using the combination of FIR and IIR filters here to give rise to the two paths—proportional and integral path is that the system of ADPLL is sensitive to the phase delay of the system coefficient \( K_{DTC} \).

The time required for the calibration of DTC gain \( K_{DTC} \) to converge depends on the value of the step-size \( \mu \). However, the larger step size \( \mu \) speeds up the converge while increasing the noise of the phase error, which further affect the phase performance of ADPLL. The converging panel of the adaptive filter is like a bowl, and the relationship of the converge speed and the step size is illustrated in Figure 3-26. Converging too fast gives rise to the noise of DTC gain \( K_{DTC} \) and the phase error detected. The whole block is clocked at the frequency of the retimed system clock \( CKR 32 \text{ MHz} \). The filter model is first built using Matlab Simulink, as such the various parameter setting is done to see the converge speed.
<table>
<thead>
<tr>
<th>Parameter</th>
<th>I</th>
<th>II</th>
<th>III</th>
<th>IV</th>
<th>V</th>
</tr>
</thead>
<tbody>
<tr>
<td>$a$</td>
<td>0</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>$b$</td>
<td>1</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>$\mu$</td>
<td>$2^{-7}$</td>
<td>$2^{-10}$</td>
<td>$2^{-8}$</td>
<td>$2^{-8}$</td>
<td>$2^{-7}$</td>
</tr>
</tbody>
</table>

Table 3-3: Configuration of DTC gain calibration parameters

See from the system diagram, only 3 parameters need to adjust. The table summarizes the different cases of the configurations. In the first group of setting $I$, the parameter $a$ equals to 0, indicating deploying the IIR filter. The Matlab Simulink simulation results of the converge conditions with different parameter configuration are shown in Figure 3-27, Figure 3-28, Figure 3-29, Figure 3-30, Figure 3-31. From the filter transition response, when the IIR filter is disabled, the time required for converging is large but the converging curve is smooth. Once enabling the IIR filter, the converge speeds up to twice times in the simulation setting. At the same time, the coefficient $b$ acts equivalent with the step size $\mu$ as a scaling factor. The fluctuation is observed in the case of the big step size $\mu$.

The Verilog model simulation is built to verify the functionality and performance of DTC gain calibration, as see in Figure 3-32 Figure 3-33. The trade-off of the converge speed and the effect of phase error is also observed. At the same time, the functionality of DTC gain calibration algorithm is proved.
Figure 3-28: Configuration II

Figure 3-29: Configuration III

Figure 3-30: Configuration IV
**Figure 3-31:** Configuration III

**Figure 3-32:** \([a, b, \mu] = [4, 1, 10]\)

**Figure 3-33:** \([a, b, \mu] = [4, 2, 8]\)
Chapter 4

Time-to-Digital Converter (TDC)

4-1 TDC Circuit Design

The PLL intends to achieve an output clock signal with a certain frequency and align the phase of an output signal with the phase of a reference clock to keep the output phase deviation controllable, that is, to suppress the phase noise of the output clock. TDC seen in Figure 4-1 in a digital PLL instead of phase/frequency detector (PFD) in the conventional analog PLL, detects the accumulated phase difference between the feedback high frequency clock CKV and the reference clock FREF, see in Figure 1, which presents the entire mechanism of a TDC. The whole procedure of detecting phase error of two clock signals involves integer part \((PHI - PHR)\) and fractional part \((PHF)\), while integer value is the result of integer counter \(PHI\) of feedback high frequency signal \(CKVD2\) subtracted from the accumulated sum \(PHR\) of \(FCW\), and the fractional part \((PHF)\) is sub-period of the \(CKVD2\), calculating from the special circuit block—the core of TDC. This chapter focuses on how to implement the core circuit of TDC. Depicted with Figure 4-1

![Figure 4-1: Detection mechanism of TDC [4]](image-url)
TDC core circuit (shortly called TDC in the following content) for purpose of fractional phase error detection is designed to translate the real-time phase difference $\phi_e$ between two adjacent rising edges of different frequency clock signal; one is the delayed reference clock, denoted as FREF, and the other is the high frequency signal $CKVD2$ driven by the divider-by-2 block into a digitalized multi-bit value $D_0...D_L$; the quantization level is determined by the traditional delay element buffer or inverter $\Delta_{BUF}$ or $\Delta_{INV}$, analogue to the traditional mixed signal design ADC. The difference is that TDC works at the true phase domain instead of the voltage domain, and the performance metric is estimated via phase noise or clock jitter.

4-1-1 Specification of TDC

Related Work

Traditional TDC design is one of the most key blocks in a ADPLL primarily because it is the dominant contribution to in-band phase noise and the accurately phase detection of a TDC makes sure that the PLL system could lock at the required frequency within acceptable phase noise. Nevertheless, the distortion of TDC transfer function is responsible for the undesired fractional spur tones at the output of a PLL. In order to meet the requirements of the stringent wireless communication integration noise in either narrow or wide bandwidth applications, most academic research done so far has invested a TDC with fine time resolution at the cost of power dissipation, leading a poor jitter-power compromise. The lists of various types of a TDC design are as depicted below:

Flash structure self-loaded inverter based TDC[3] The quantization level of this structure leverages on the characteristic gate delay in modern Nano-scale CMOS, and is expected to improve with CMOS technology scaling. This structure of a TDC has the fastest conversion
speed compared to any other solution of a TDC. However, the time resolution of an inverter in the normal supply voltage case is limited to around \((15 - 20)ps\) in 90\(nm\), 65\(nm\), 40\(nm\) CMOS technology, which is hard to meet the stringent requirement of \(-100dBc/Hz\) in-band phase noise for some wideband communication standards.

**Vernier delay lines [31]** In this structure, the quantization level is represented as the time difference between two different tapped buffer (inverter) chains whose time resolutions are slight varied, that is, \(\Delta_{TDC} = \Delta_1 - \Delta_2\); this technique relieves the requirement for the finer resolution of a single buffer (or inverter) limited by the advanced semiconductor process node. Nevertheless, a digital-locked loop (DLL) is utilized to stabilize the resolution against the process variation. Sub-gate resolution smaller than 10\(ps\) is presented in the similar publications. In the other hand, one obvious problem by this approach is its poor cost-efficiency in both area and power; not only large number of delay cells are needed to cover a large dynamic range for sake of enough phase detection margin, but also long length of bins in a delay chain degrades the linearity performance.

**Passive interpolation TDC [39]** Based on the idea of a flash inverter-self-loaded TDC, the digitalization process is implemented by a coarse-resolution delay line, i.e. a chain of self-loaded inverters, that is interpolated with a set of local passive voltage dividers, using resistors or switched diodes. In case that the CMOS single-stage gate, i.e. inverter provides a signal transition time in the order of the gate delay, the local passive interpolation technique provides the intermediate node in the middle of two transition regions, together with a comparator to detect the crossing of the midlevel. Quarter of an inverter delay resolution is achieved, based on 90\(nm\) CMOS technology, 1.2\(V\) nominal supply voltage. Although the high resolution of a local passive interpolation TDC is arbitrary, the local variation limits it. At the same time, the proper sampling elements are need to design carefully, as the sampling interpolation sequence or the comparator sampling sequence is possible to be disordered by the local variation, while the sequence of interpolated signals remains unchanged. Poly-silicon resistors are implemented in the design and the mismatching due to layout takes into consideration.

**Noise-shaping Gated ring oscillator based (GRO) TDC (for oversampling) [25]** The design adopts ring-oscillator inverters to detect the transitions of two signals within the measurement window for phase error. Since the previous time residue or quantization error is kept for the next phase measurement, the equivalent function of the first order difference calculation is achieved as the quantization-noise-shaping of TDC. As such, the effective time resolution is far finer than single CMOS inverter. The design demonstrates up to 6\(ps\) resolution of TDC on 0.13\(?m\) CMOS technology. The GRO technique leverages in the nature of low pass filter loop to suppress the quantization noise of TDC which is pushed towards high frequency band because of noise shaping. Yet, the resolution of GRO TDCs is limited by the gate delay. And the current consumed by GRO is up to 2.6\(mA\) under 1.5\(V\) supply voltage, which demonstrates poor-power efficiency.

**Multi-level interpolation [30]** The idea based on the counter to calculate the cycles of a reference signal and apply delay line as interpolation to measure time interval between two signals START and STOP. Optimization of Multi-level interpolation is adopted to reduce the
amount of delay elements, registers needed, so as to reduce the area and power cost. More, the
design scales the loading capacitors at the output of delay elements to adjust the resolution
of delay elements. Nevertheless, the third reference clock signal is required for interpolation
and the variation of parallel delay structure causes the nonlinearity because of the process
parameters and chip layout effort.

**Time amplification** [35]  Inspired by the idea of coarse-fine ADCs, time-amplifier (TA) based
coarse-fine TDC is implemented; an array of TAs is utilized to store and amplify any possible
time residues between the signals at the inputs of stages in the coarse delay chain. The
selected time residue is further converted by the identical delay chain. Inserted inverters as
offset delay and switched capacitors loaded at the output of NANDs are proposed to break
the trade-off between large time gain and high linear range of TAs. The main problems for
this approach are that group of TAs occupies large chip area and power dissipation; at the
same time, the latency of a TA is near 1ns in this implementation, so the dead time of TDC
cannot be neglected in the design for wideband PLL systems and fast settling speed.

**Stochastic** [44]  Different from the conventional idea based on the gate delay in a given tech-
nology, the stochastic TDC takes advantage of the mismatch-induced time offset of identical
arbiters. The sub-ps resolution is achieved at cost of the reduced dynamic range of TDC.
In term of Gaussian distribution variations of time offset at the input of arbiters, the average TDC output follows the error function law, respectively corresponding to the number of
arbiters whose output is logically signal ‘1’ multiplying the probability. Parallel arbiters are
carefully sized and Monte-Carlo simulation is applied to assure the identical time offset and
the linear detection region under high accuracy of TDC. Thus, the theoretical time resolution
of TDC is a function of the time offset of arbiter in order of several picoseconds and the
number of arbiters applied. In practice, up to 1024 arbiters are utilized, which consumes a
large area and power, meanwhile, much more effort should be taken to carefully layout in case
of uncertain mismatch.

Besides the above primary ideas of sub-gate resolution TDCs, the combinations of those
structures are also investigated in academic field and demonstrate doable and attractive per-
formance metric, like two-dimensional Vernier lines TDC [45] for economic issues of both area
and power dissipation, GRO-based Vernier Line TDC [46] to further improve quantization
noise integrated in the interested bandwidth compared to the native resolution of Vernier
Line based TDC; the extension of time- amplifier based PI and PFD acting as a TDC [47].
However, the energy-efficiency in the above fine resolution trial cases is poor and the ultra-
low power TDC is prior in the proposal ADPLL. In all, the prototype of a flash self-loaded
inverter-based TDC is investigated. More detail circuit design is demonstrated in the following
content. And the overall design specification of a TDC is given in Table 4-1

### 4-2 Structure of TDC

The resolution of a TDC in the proposed ADPLL for Wireless Personal Network commu-
nication standard is easy to implement thanks to modern Nano-scale CMOS technology; yet,
power dissipation is the bottleneck in the design of a TDC. The proposed TDC here aims at
4-2 Structure of TDC

<table>
<thead>
<tr>
<th>Item</th>
<th>Requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>40nm CMOS</td>
</tr>
<tr>
<td>Power supply VDD</td>
<td>0.9V</td>
</tr>
<tr>
<td>Signal Range $V_{IN_{pp}}$</td>
<td>0.9V</td>
</tr>
<tr>
<td>Load</td>
<td>5fF</td>
</tr>
<tr>
<td>Power consumption</td>
<td>&lt; 0.05mW</td>
</tr>
<tr>
<td>Reference Frequency</td>
<td>32MHz</td>
</tr>
<tr>
<td>Time resolution</td>
<td>20ps</td>
</tr>
<tr>
<td>Dynamic range</td>
<td>320ps</td>
</tr>
</tbody>
</table>

Table 4-1: Design goal of TDC

Figure 4-3: Architecture of a clock gated TDC

dramatically reducing the power consumption by exploring a novel clock gating approach that allows a significant reduction in the operation frequency of most circuit blocks.

4-2-1 Building blocks in TDC

The entire architecture of a clocked gated TDC is depicted as one draws in Figure 4-3. In this architecture of a TDC, there are two basic circuit elements: CMOS delay elements and data detection elements, implemented by a differential inverter chain and sense-amplifier based D-Flipflops. One stage concluding a pair of inverters and a sense-amplifier based D-Flipflop is required to resolve one LSB. Extra weak crossed inverters are used to align the edges of differential inverter outputs, which reconstruct and compensate the signals against the clock jitter at the output of inverters, for sake of the next stage detection. The prototype of a core TDC is illustrated in Figure 4-4. A signal like START propagates along a line of delay elements, and every outputs of delay elements is connected to the data node of a flip-flop. Thus the state of propagated START signals is sampled by the signal STOP, and the outputs of groups of flip-flops is in the format of pseudo thermometer codes and the transition position like $1-0$ indicates a measure of phase difference between the signal START and signal STOP.

Normally D-Flip-Flop transports a logical signal either of logical high value or of logical low at the transient of the clock signals, so that there exists a meta-stability curve for the input of the flip-flops, which is quite common in digital circuits. To solve this difficulty, differential delay lines are connected to a totally symmetrical differential flip-flop, here sense-amplifier based D-flip-Flop is adopted, with differential format of signal START at the input data.
and the signal STOP as the clock, and the output Dx transferring to 1 if signal START leads STOP and to 0 vice versa. The circuit diagram of sense-amplifier based D-flip-flop is depicted in Figure 4-6. The data readout circuit utilizes a Latch for purpose of synchronization and the latch is clocked by the falling edge of signal STOP opposite to the polarity used in the data detection elements, leaving enough time margins for one TDC accurately conversion.   
And a timing diagram for 5-stage conversion is depicted in Figure 4-7. The very first START leads the signal STOP about $\phi_e$ phase difference. The phase delay of the adjunct elements is the time resolution of $\Delta_{TDC}$. So this 5-stage conversion translates the original phase difference to a 5-bit binary code, i.e. '00111'. The detection range of this 5-stage conversion is $5 \times \Delta_{TDC}$, and obviously if the phase relationship between signal START and STOP is arbitrary, the safe range for detecting the location of START is as long as the period of signal START. Likewise, when coming to the term of the dynamic range of a TDC, it indicates the maximum phase interval to be detected without any saturation effects. The delay element is implemented by a custom sized inverter instead of a buffer for it provides almost half of the time resolution and dissipates less power in case of the same amount of stages needed, although the different polarity of signals in the delay line by the various driving strength of PMOS and NMOS causes the even-odd mismatch. The advantage of such circuits is the resolution and the area cost scale according to the technology factors. The tuning/voltage-controlled CMOS inverters or buffers are avoided here as the intrinsic of this solution is analog design and the performance degrades with technology scaling.

Additionally, the edge-align circuit is utilized to split the reference signal FREFDLY and provide the well-aligned and complementary signals at the input of the propagated chain with the depiction of Figure 4-8.

In short, the input signal edge rate which is either too fast or too slow can have chance to corrupt the operation in the design, see in Figure 4-10, if the slew rate of signals was slow, more noise energy could be picked up from the device parasitic and it leads to functional failure, like false triggering on flip-flops. In the design of the core TDC circuit, the slow slew rate of delayed version $FREF_{DLY2}$ can destroy the effective resolution of the delay elements. For the complementary signals, a set of cross-coupled inverters is inserted to make sure the phase alignment under the local variation. And derived from the circuit diagram of cross-coupled inverter-assisted differential delay line, the time resolution of single stage is defined with depicted in Figure 4-8.
Figure 4-5: NOR logic gate based Latch

Figure 4-6: Sense-amplifier-based D-flip-flop
Figure 4-7: Timing diagram of 5-stages conversion

Figure 4-8: (a) Edge align circuit (b) Crossed inverters
Figure 4-9: Definition of time resolution in differential delay chain

Figure 4-10: (a) Noise possibility on fast & slow signal edge (b) uncertainty on weak-constrained edge slew rates
4-3 Principle of Clock Gating

As is known, this flash self-loaded inverter-based TDC architecture is easy to implement, but with the limitation of the heavy power dissipation and the technology-based time resolution. As far as the power dissipation is concerned, a TDC is one of the most power heavy blocks inside ADPLL system, mainly because parts of circuits is toggled at the high frequency signal from DCO path, i.e. $CKVD2$ in the design in order of several GHz. Specially, the dynamic power is dominant in digital circuits, evaluated as

$$P_{\text{dyn}} = C_l \times V_{\text{DD}} \times V_{\text{swing}} \times f$$

Where $f$ is the frequency of signal transition of logic 0 and logic 1. In the conventional structure, the signal swing is rail-to-rail and the maximum frequency of TDC input signals is determined by the variable phase signal frequency in order of GHz, driven by the high frequency blocks, i.e. DCO or high frequency divider after DCO. In the design, two factors affect the power dissipation of TDC directly; one is the dynamic range of TDC, that is, the delay length enough to cover one period of the variable phase signal $CKVD2$, and the other one is the frequency of $CKVD2$.

In the conventional design of a TDC for correctly fractional phase error detection, the dynamic range of a TDC is slightly larger than the period of $CKVD2$, which corresponds to the amount of high-frequency-running components required. The smaller frequency of $CKVD2$ is, the larger amount of components is required. In the design of ultra-low power ADPLL, a DTC is implemented as an auxiliary path to dramatically reduce the dynamic range needed of a TDC. Furthermore, the overall power dissipation reduction is overcome with the specially technique—clock gating to slow down the variable phase signal frequency, that is, just picking up one interested edge of $CKVD2$ exactly around the observing edge of FREF, and the idea is briefly depicted with Figure 4-11. When feeding the signal $CKG$ instead of $CKVD2$, the dynamic power dissipation is decreased to a large extent.

Different techniques have been reported recently. Reference [48] utilizes a time-windowed technique in the first step of TDC, as shown in Figure 4-12. The output of a DCO and the reference signal REF are fed into a AND-based single-shot pulse generator. The enable window (called time window in the paper) is determined by the inverter length in a inverter-based time quantizer. In terms of correctly fractional phase error detection, without residue type structure assisted, the dynamic range of TDCs covers at least one period of $DCO_{out}$, and in practice, some delay time margin is reserved. Consequently, failure detection occurs when REF approaches close to $DCO_{out}$, see in Figure 4-13. In practice, the propagation time of the AND is non-zero, and if the transient of the signal EN lagged $DCO_{out}$, the first toggled rising of $DA_{out}$ corresponds to the transient of EN instead of the expected...
Figure 4-12: Time windowed TDC prototype: the first stage for coarse detection and power reduction

Figure 4-13: (a) Purposed timing diagram of the idea of time windowed TDC (b) failure picking up of \textit{DCO\textsubscript{out}} edges in the prototype of time-windowed TDC
DCO_out: at the same time, the glitch or the second rising edge of a DA_out occurs, since the dynamic range of delay time quantizer covers the whole period of the signal DCO_out. The TDC mechanism system becomes complexity. Even in the case of the phase-prediction ADPLL, the dynamic range of a TDC is narrow avoiding of the glitch. The failure transition of DA_out missed the phase information of DCO_out. The chaos of a rising edge detection of DCO_out is mainly due to the logic gate AND. Reference proves that such asynchronous logic gates always introduce glitches problems for purpose of clocking gating. More safe and robust design of clocking gating circuit is investigated here.

4-3-1 Implementation of the clock gating technique

The circuit diagram of the clock gating technique is shown in Figure 4-14. As seen, two input signals and two output signals are clarified; FREF_DLY donates the delay version of FREF after the pre-delay process of DTC operation, CKVD2 is driven by the divider-by-2 following a DCO. The two outputs CKG and CKR is short for the gated variable phase signal and the retiming reference clock signal; CKG keeps the phase information corresponding to the expected variable phase signal CKVD2 to feed into a TDC for fractional phase error detection, and CKR runs at the same frequency of FREF and is used as the system clock as for the original clock signal FREF does not run synchronized with CKVD2, and it introduces metal-stability issues for digital circuits naturally. As reported in lectures, the main idea of retiming is that the variable-phase high frequency signal CKVD2 is sampled at the rate of the reference clock signal FREF, using D-flip-flop. Special circuit topology needs to deal with metal-stability issue, i.e. [3], the outputs of two paths of variable D-flip-flop chains go via a MUX that the selection bit is determined by the fractional phase error polarity transition point resulting from the core TDC circuit. Groups of D-flip-flops are toggled at the high frequency, and it is not a power efficiency solution. The advantage of the technique used here is that the phase relationship of FREF and CKG is determined and it simplifies the design of the retiming clock signal CKR. Looking into the circuit diagram, standard-cell D-flip-flops with asynchronized reset and 3-input logic ORs are adopted; it is technology scalable and easy to design. In the operation, whose timing diagram is illustrated in Figure 4-15, the DFFs M1 and M2 are kept to the reset status during the negative period of the clock signal.
4-3 Principle of Clock Gating

**Figure 4-15:** Circuit diagram of the clock gating technique

$FREQ_{DLY}$: once the $FREQ_{DLY}$ turns high, it triggered the transition of clock gating enable window indicated by the label $CKENB$. After that, one logic OR gate is used to detect the expected rising edge of the variable phase signal $CKVD2$ to generate the internal variable phase signal $CK1$; despite the propagation time between the $CKVD2$ and $CK1$, the phase information lagging $FREQ_{DLY}$ does not lost, and the delay of a logic gate is theoretical constant and uncorrelated to the input signals. The transition of $CK1$ from low to high triggers DFF $M2$ and generates the wanted gated variable phase signal $CKG$, in the case of $CKR$ holding logic 0. The input data of $M2$ keeps high so that $CKG$ only has one time to transfer from low to high once the clock signal $CK1$ is turned on.

The approach is straightforward and easy to implemented, but things are that if $M1$ and $M2$ kept their reset signals at high all the half period of the signal $FREQ_{DLY}$, $M2$ will toggle at the frequency the same with the signal $CKVD2$, which cost unnecessary power. As a result, the problem is overcome with the help of the successful retiming clock signal $CKR$. The transition of retiming clock signal $CKR$ performs as a mask on the intermediate signal $CK1$ so that the signal $CK1$ is always observed at high level after that. As such, the retiming reference clock gating together with variable phase signal is investigated, which is presented in the next section.

4-3-2 Meta-stability Issue in Clock gating

In practice, the propagation time of the DFF $M1$ and the logic gate OR $I1$ cannot be neglected; when the rising edge of variable phase signal $CKVD2$ is close to that of the reference signal $FREQ_{DLY}$, meta-stability problem occurs at the input of the DFF $M2$. With the concern of the correctly functional and timing performance of clock gating technique, the analysis of the meta-stability is presented. Assume that the phase departure between $CKVD2$ and $FREQ_{DLY}$ is much smaller than the propagation time of D-flip-flops. The time diagram of this situation is demonstrated in the figure; expect the extra offset inserting into the phase error $\phi_e$, the gated clock signal $CKG$ keeps the phase information of $CKVD2$, and the time offset is compensated in the path of $FREQ_{DLY}$. 
4-4 Reference Clock retiming

The reference signal and variable phase signal are asynchronized in the digital domain, which leads the meta-stability issue for phase error calculation. In order to make the all-digital PLL work in a clock synchronous manner, the retiming reference clock signal \( CKR \) by the variable phase signal \( CKVD2 \) is proposed in this section. Effort is taken in Reference [3], instead of directly feeding the low frequency signal into a series of D-flip-flops clocked by the high frequency, two parallel clocked memory elements are clocked by different polarity of variable phase signal \( CKV \) derived from digital-controlled oscillator in the design. As such, the reference signal \( FREF \) is oversampled by either the rising edge or the falling edge of the signal \( CKV \). After inserting shift register stages for delay purpose, the multiplexer is applied to select one of the clocked signal paths as the transition of the signal \( CKR \); the selected clocked path is safe from meta-stability conditions, according to the mid-edge detection resulting from the raw data output of the core TDC circuit. Many successful projects demonstrate the robustness and effective, however, it is not a proper choice in the design of a DTC assisted TDC mechanism. Thanks to DTC assisting, the dynamic range of a TDC core circuit is theoretically as narrow as the quantization error due to the DTC part. As such, it is hard to derive the mid-edge detection from the raw data of a TDC. Moreover, quite a few of clocked memory elements are needed, which is clocked at the high frequency in order of \( GHz \); it costs much power. Thus, the retiming circuit is simplified regarding to the system architecture.

Back to the primary idea that the reference signal \( FREF \) is clocked by the variable phase signal \( CKVD2 \) via a clock memory element, the main concern is about the meta-stability condition as the phase relationship between the signal \( FREF \) and \( CKVD2 \) is uncertain. In the other hand, the signal \( CKVD2 \) lags the delayed reference signal \( FREF_{DLy} \) theoretically; the D-flip-flop based clock gating circuit generates an intermediate clock signal \( CK1 \), the delay version of the next rising edge of \( CKVD2 \) to the signal \( FREF_{DLy} \). It follows that the retiming clock \( CKR \) is generated by counting \( CK1 \) edges. In the circuit diagram, the rising edge of \( CK1 \) clocks the cascaded D-flip-flop, so that three cycles of the internal variable phase signal \( CK1 \) are counted and the transition edge of \( CKR \) covers a mask to \( CK1 \) to keep the \( CK1 \) equivalently logic high level. As aspect of timing diagram, \( CK1 \) counts for several \( CKVD2 \)
periods in order to leaving enough time margins for the fractional phase error detection, and ciphering. In term of the system bandwidth, the delay in the order of $CKVD2$ period is neglected.
Chapter 5

Implementation and Experimental Verification

5-1 Considerations of Layout in Nano CMOS Technology for RF Circuit

In the past, the work of layout is separate from the circuit design. Thanks to the accurate SPICE modeling of transistors and the extraction tool like Calibre, the post-layout simulation results match the previous schematic simulations under the acceptable variation. On the other hand, the technology scales down and the model parameter variation accounts more than before for the real circuit performance, in practice. The era of custom layout design comes not only in the analog integration circuit design but in the digital circuit design. The real circuit performance benefits a lot from the custom layout. The primary goal of layout is to assure the speed of circuits, the power dissipation involving the physical connection between every node, and to keep the chip area as compact as possible. The challenge in the deep-submicron regime of technology is that increasing leakage currents preclude further constant field scaling. As a consequence, the speed leverage of new technologies is moderate, especially in the field of low-power applications. [39] in a word, to reduce the parasitic capacitance and resistances helps to increase the transistor speed. Except the transistor layout parasitic, one needs to take the signal path parasitic into consideration. Layout flow to make sure the accurately function and good performance of circuit in the design is depicted with the flow chart Figure 5-1.

5-2 Layout of Key Blocks in DTC and TDC

5-2-1 Delay Element Layout

The basic delay element in both DTC and TDC is an inverter or a buffer. In order to achieve high speed under lower supply voltage than nominal voltage (1.2V), the big size of PMOS
Figure 5-1: The layout flow
and NMOS transistors is a straightforward choice. Transistor folding technique is well-known since it has smaller gate resistance and makes the layout more compact. Meanwhile, folding a transistor to multiple fingers means that a larger variation in the circuit performance since the variation made to the several fingers is several times larger than the same variation made to a single transistor. The layout can be challenging for the circuit designer to optimize the design for better yield. The signal path length is increased in folded transistor layout; more metal resistances are involved. As a result, the trade-off of simple inverter layouts is taken into consideration.

Besides the single cell performance, the global placement of a DTC or TDC is also very important, which will influence the linearity performance. Most popular topologies for layout placement are device matching, device symmetry and device proximity, indicated in Figure 5-2, in order to minimize the impact of parasitic, process variations and various operating conditions.

![Diagram of device matching, device symmetry, and device proximity.]

Figure 5-2: Layout topologies constrains (a) device matching (b) device symmetry (c) device proximity

The proximity topology does not achieve geometry symmetry to some extent and the separate device placements specifically for PMOS and NMOS are investigated so as they can share a common substrate/well region, be surrounded by a common guard ring or be placed close to matched devices. Principally, it decreases the effect of substrate coupling, and also avoids large mismatch and deviations during the fabrication process. This layout constrain is adopted in CMOS digital circuit design.

Supported by the theory presented above, the unit cell layout of a DTC and TDC is shown in the following content.

The layout of DTC cells in Figure 5-5 is mainly improved to shorten the signal path from the input $D_i$ and the output $D_{i+1}$, as well as the physical path length from $FREF$ to $FREF_{DLY}$. Figure 5-6 shows that 64 – stage DTC places in 2 rows with dummy stages at the beginning and ending of the chain in order to guarantee the identical environment for every DTC unit. The large size and single finger of transistors for output-enable (OE) buffers is adopted for the delay element of DTC, see in the Figure 5-4. The compact and folded transistor placement is tried, as see in the following figure Figure 5-7.

Considering the drain and source parasitic capacitance, the circuit model after extraction of layout is demonstrated as the following figure Figure 5-8. The parasitic capacitance crossing the input and output of an inverter is equivalent to Miller capacitor, and the equivalent input
Figure 5-3: (a) Layout of OE buffer in DTC

Figure 5-4: (b) DTC selection set layout in Proximity constrain
Figure 5-5: The layout of Unit cell of DTC

Figure 5-6: The layout for the entire DTC

Figure 5-7: The proximity topology and diffusion sharing in unit cell of DTC
64 Implementation and Experimental Verification

Figure 5-8: The extraction results of the parasitic capacitance in OE buffer

capacitance is the crossed capacitance multiplied by the gain factor of inverters. As a result, increasing the size of transistors can boost the driving ability of transistors but enlarge the parasitic capacitance at the same time. Proper size of delay elements are chosen here for both speed and power requirements. Similar methodology is used to develop the layout of delay chain of the core TDC circuit as illustrated in Figure 5-9.

5-2-2 Sense-Amplifier Based D-flip-flop

The sense-amplifier based D-flip-flop is a symmetric structure at the schematic level, so must be the layout, in order to eliminate the mismatch of different polarity of input differential signals and the time offset at the inputs, see in the figure Figure 5-10.

5-3 Post-simulation Results

5-3-1 DTC

Monte-Carlo Simulation Device mismatches or local mismatches mean the small parameter variations in the characteristics of identically designed devices and occur during the manufacturing of integrated circuits. Local mismatches impact the functionality or the performance of neither analog nor digital integrated circuits, and could even lead the circuits invalid. Normally, it is difficult to analytically predict the behavior of circuits under the mismatch conditions. The impact of the random variations is investigated with Monte Carlo simulation to model random mismatch between different components due to process variation through analyzing a big amount of circuit initial states and instances.

Monte-Carlo simulation is to get a better sense of DTC circuit performance distribution after the real layout is done. The voltage at output nodes from transient simulation are probed and further calculated the timing metrics in each Monte Carlo run and plot in histogram with 1500 samples. The simulation is based on the extraction netlists of DTC layout and runs at the supply voltage of 1V and 27° temperature. The results reflect the tolerance of the
Figure 5-9: Layout of the TDC core unit cell
Figure 5-10: Symmetric layout of a sense-amplifier based D-flip-flop
DTC circuit to device mismatch errors. As known, the monotonic characteristics of a DTC is the primary requirement to assure the correctly functionality of ADPLL system. The Monte Carlo simulation verifies the random variation of the delay strings using the same cells as DTC, illustrated in the test plan diagram Figure 5-11.

Table 5-1 gives the explanation of the variable parameters for Monte Carlo simulation, which is illustrated in the following figures. The Monte Carlo simulation results show that the propagation time between the rising edges of the input and the output of 8-stage delay element string is normally distributed; the variation of each stage propagation time is in order of femtoseconds. For sake of power saving, the unbalanced PMOS and NMOS driving abilities are applied via custom sizing of delay cells in the DTC Section 3-3-2; as a result, the propagation time of the falling edges of clock signals is larger than that of the rising edges, but depends on the selection codes. The button two pictures depict that the average time delay of each delay unit cell is 19ps. The result $D_{52}D_{6r}$ indicates that for the reset action of the clock signal, the logic reset of the internal input of all the stages occurs at the same time regardless of the jitter or parameter variations well agreeing with the theory.

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>$Ck2D_{x_r}$</td>
<td>The phase delay of the falling edge of $D_x$ and the rising edge of input clock</td>
</tr>
<tr>
<td>$Ck2D_{x_f}$</td>
<td>The phase delay of the rising edge of $D_x$ and the falling edge of input clock</td>
</tr>
<tr>
<td>$Dx2D_{y_r}$</td>
<td>The phase delay of the rising edge of $D_x$ and the rising edge of $D_y$</td>
</tr>
<tr>
<td>$Dx2D_{y_f}$</td>
<td>The phase delay of the falling edge of $D_x$ and the falling edge of $D_y$</td>
</tr>
</tbody>
</table>

Table 5-1: The annotation of measured variables in the transition operation

**Transient Operation of DTC**

Furthermore, the transient operation is done to verify the function of DTC and its linearity performance. From the Figure 5-16 and the Figure 5-17, one can find the time resolution of DTC at the supply voltage 900mV is 27ps with the total delay length of 2.8ns. Moreover, the DNL or INL parameter of the linearity of DTC is presented with the maximum 0.3DNL and the maximum 0.18 INL.
Figure 5-12: Histogram of Monte Carlo simulation of 8-stage DTC delay chain $Ck2D7_r$: the propagation time from the rising edge of CLK to the falling edge of the output; mean propagation time is 241.659\,\text{ps}, standard deviation is 6.32356\,\text{ps} with 1500 samples.

Figure 5-13: Histogram of Monte Carlo simulation of 8-stage DTC delay chain $Ck2D7_f$: the propagation time from the rising edge of CLK to the falling edge of the output node; mean propagation time is 150.582\,\text{ps}, standard deviation is 10.9973\,\text{ps} with 1500 samples.
Figure 5-14: Histogram of Monte Carlo simulation of 8-stage DTC delay chain $D_5D_6_{-f}$: the propagation time from the falling edge of $D_5$ output to the falling edge of $D_6$ output: mean propagation time is 19.4115 ps, standard deviation is 530.574 fs with 1500 samples.

Figure 5-15: Histogram of Monte Carlo simulation of 8-stage DTC delay chain $D_4D_5_{-r}$: the propagation time from the rising edge of $D_4$ output to the rising edge of $D_5$ output: mean propagation time is 15.7041 ns, standard deviation is 15.6182 ns with 1500 samples.
Figure 5-16: Transfer function of DTC with post-layout extraction

Figure 5-17: Linearity performance of DTC with post-layout extraction
5-3-2 TDC

The straightforward way to estimate the functionality of TDC is to measure the phase delay between two input signals of the same frequency. Since the delay between the two signals is constant, every transient responses to one possibility quantization level. Sweep the phase delay as long as the whole detection range of the TDC, then get the transfer function of TDC. The tinier phase delay step size is for sweeping, the more sampling point is in the same quantization level. The huge set of sampling results benefits the linearity performance analysis. Figure 5-18 and Figure 5-19 below clearly show the effectiveness of TDC, involving the transfer function curve of TDC, the histogram of the TDC outputs, as well as the DNL/INL curve.

![TDC Transfer Function](image1.png)

**Figure 5-18:** Transfer function of TDC with post-layout extraction

![Raw Histogram of TDC Output](image2.png)

**Figure 5-19:** Histogram of transition operation of TDC with post-layout extraction
72 Implementation and Experimental Verification

![Graph](image)

**Figure 5-20:** Linearity performance of TDC with post-layout extraction

<table>
<thead>
<tr>
<th>Output port</th>
<th>Phase noise (dBc/kHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>D25/DB25</td>
<td>138.6/100</td>
</tr>
</tbody>
</table>

**Table 5-2:** The phase noise at the TDC delay chain output node

### 5-3-3 Clock Buffer

Table 5-2 summarizes the phase noise performance at the final stage of TDC. From the noise contribution analysis, the main noise contribution is the clock buffer for the input reference clock signal. More, the dominant noise contribution type is flicker noise of the PMOS in the first stage inverter.

### 5-4 Measurement of DTC and TDC

The circuit of a DTC and a TDC integrated with the digital synthesized blocks and DCO circuit, digital PA on one chip for ultra-low power ADPLL has been implemented and fabricated using a 40 nm low power CMOS technology. The floor plan of the entire system is shown in the Figure 5-21.

The chip of ADPLL is compact and especially for the TDC and DTC part, thanks to the nano-scaled CMOS technology and the custom layout effort. Dedicated power pad for the combination of DTC and TDC, the blocks of digital synthesis part, the core circuit of DCO, and the block of power-amplifier are employed to limit the impact of the supply noise coupling, see in Figure 5-22. The custom digital circuit composes the part of TDC and DTC, and the transient peak current makes effect on the fluctuation of the supply voltage, as such, the decoupling circuit is required and the parameters are calculated approximately avoiding of degrading the driving ability and the circuit response speed. The decoupling circuit topology is illustrated in the Figure 5-23.

Bindi Wang  Master of Science Thesis
Figure 5-21: The floor plan of all-digital PLL

Figure 5-22: Dedicated Power supply pads assignments for different blocks of ADPLL

Figure 5-23: The decoupling circuit in the path of power supply line
PMOS supply bypassing capacitors are employed: PMOS with source, drain, and substrate shorted. The size of decoupling capacitors C is roughly calculated by integrating the transient current of the DTC within one period of $F_{REF}$: in practice, 5 times of the electronic charge periodically in tolerance with $45mV$ deviation of the supply voltage.

5-4-1 PCB Design and Test Bench for ADPLL

The chip is packaged in QFN32. The four-metal-layer PCB is designed for the peripheral circuits, including level-shifters, low-dropout regular (LDO), RS232 communication module to support the remote control using FPGA and PC together and provide the stable supply voltage at 1V. When measuring the DTC and TDC separately, one should be able to properly set up the input signals and capture the output nodes of both modules. Therefore, the floor plan for measurement purpose is proposed. As see in the illustration of the measurement floor plan Figure 5-24, given the limited number of IO pads, only input and output signals labeled in the figure are reserved for testing, they are, the reference signal $F_{REF}$, the external clock signal $CKV_{EXT}$ alternative to the internal clock gated signal $CKG$, the delayed reference signal $F_{REF}_{DLV}$, four digital bits of TDC output for the open loop measurement.

In term of the DTC measurement, the manual selection codes of DTC is set via SPI registers and one MUX is used to pick up the codes from the manual setting path or the phase prediction outputs. Once the measuring state is starting, the output buffer for driving measurement instruments is enabled, so that both raw data of both the input reference signal and the output delayed reference signal are accessible and ready for post-processing.

As far as the TDC measurement is concerned, it is straightforward to look into the raw data of TDC output codes which is synchronized with the input reference signal, although it is not easy for well configuring the two input signal phase delay precisely under acceptable jitter performance. Multiple methods for it are investigated in the following content. Here the general measurement floor plan is proposed. Similar with DTC measurements, two input

Figure 5-24: Measurement floor plan: (a) DTC, (b) TDC
clock signals $FREF$ and $CKV_{EXT}$ are needed to profile the two phase-shifted signals. As presented in the previous chapter, the clock signal from the high frequency synthesized path is gated, labeled as CKG; as such, one MUX is assigned here to choose the external clock signal $CKV_{EXT}$, which is easily controlled to model the phase leading or lagging to the reference signal $FREF$. More, output buffers for driving IO pads are necessary.

Based on the measurement floor plan, remote control of both measurement instruments and PCB with the test chip is required, depicted with Figure 5-25. In practice, SPI registers are adopted for flexible system configuration. And the cooperation of the SPI blocks and the test chip is described in the figure. Meanwhile, friendly-interfaced Matlab GUI is designed to help communicate with the test chip via RS232 protocol.

**Close-loop ADPLL Measure**

The spectrum of the all-digital PLL output signal is observed using Spectrum Analyzer, see in Figure 5-26. The top diagram shows that the reference spurs is at $-70\text{dBc}$ level, while the fractional spurs is poor, as illustrated in Figure 5-27, which just meets the system requirements of $<-30\text{dBc}$. The biggest fractional spur tone locates at $0.5times f_{REF}$, and is below $-65\text{dBc}$. The bottom diagram in Figure 5-26 indicates that adjusting the bandwidth of ADPLL, the close-in fractional spur tones energy does not change. The high-pass filter of DCO phase noise is proved but one could not figure out the low-pass filter of TDC path phase noise at the near-integer mode of ADPLL. During measurement, slightly improving the supply voltage by $0.05\text{V}$ or slowing down the frequency of $FREF$ is incredibly to improve the fractional spurious tones. Unfortunately, failure to set the interface of DTC gain calibration, the impact of DTC linearity regenerated by calibration could not be verified. The in-band phase noise is as shown in Figure 5-28: $-90\text{dBc}/\text{Hz}$ at $30\text{KHz}$ and $-109\text{dBc}/\text{Hz}$ at $1\text{MHz}$ corresponding to the RMS jitter level of $1.71\text{ps}$. This proves the working of the pair of DTC and TDC design as expected.

**5-4-2 Test Plan of Phase Predicted DTC**

As suggested in the previous section about the floor plan of the DTC testing, the setup for DTC is illustrated briefly as followed Figure 5-30.

The most challenge part for it is to provide the clean input clock signal with high quality of jitter performance and to synchronize both the input signal and the output signal as to
Figure 5-26: Output spectrum of ADPLL
Figure 5-27: Measured fractional Spurs over Bluetooth Smart Channel

Figure 5-28: Spectrum of the ADPLL output
align well and eliminate the meta-stability condition of the measurement instruments. An illustration of how to facilitate the DTC measurement is presented in the Figure 5-29. Signal generator is used to generate the RF output signal in the sin waveform covering from 9kHz up to several GHz. Additionally, the component of DC block is here to bias the signal DC voltage equal to half of the supply voltage and get a rail-to-rail signal, in order to successfully trigger the logic gates of the DTC and TDC. The alternative to the dash box is to adopt the pulse generator. Test automation is accomplished as well. The synchronized input and output signals are closely monitored and post-processed. The expected DTC time resolution is targeted at 20ps at 1V supply voltage. For each step of selection codes, in order to take precisely transition information of the phase difference done by the DTC, the sampling rate of the digital oscilloscope needs to be larger than twice of $1/20\text{ps} = 50\text{GHz}$. Meanwhile, in term of the principle of the DTC: controllable phase delay at the rising edge of the reference clock signal, so one can apply a 32MHz pulse signal as the reference clock signal and sweep the entire range of selection code automatically. The measured variance observed can be the phase difference by the subtraction of two time stamps, or the duty cycle of the output signal of the DTC.

Since the falling edge of the output signal $FREF_{DLY}$ is theoretical independently from the selection code and fixed phase lagging the reference signal FREF in general, the duty cycle of the output signal $FREF_{DLY}$ is proportional to the phase departure at the rising edge, and the profiling of duty cycles related to the selection codes is linear with certain offset, explained of the equations Eq. (5-4-2) and Figure 5-31.
Measurement of DTC and TDC

Once the selection code set, the duty cycle $D$ is calculated as:

$$D_k = \frac{T_0/2 - \Delta t^k}{T_0/2 - \Delta t^k + (T_0/2 + \Delta t^m)}$$  \hspace{1cm} (1)

$$D_k = \frac{T_0/2 - \Delta t^k}{T_0 + \Delta t(m-k) - T_0 + \Delta t(m-k)}$$  \hspace{1cm} (2)

See in the equation (1), $T_0$ is denoted as the period of the reference signal $FREF$, $\Delta t$ indicates the time resolution of the DTC, $k$ and $m$ are the selection codes for two adjacent cycles of $FREF$, respectively. In term of statistic sweeping, $k = m$, ignoring the impact of the period deviation of the signal $FREF$, the duty cycle $D$ is linear with the selection code $k$. Once the duty cycle of the output signal $FREF_{DLY}$ observed, the time resolution of the DTC can be calculated, as well as the linearity performance of the DTC. See in Figure 5-32, Figure 5-33:

Alternative method to measure the transfer function is to estimate the phase departure direct between the input reference signal $FREF$ and the output delayed signal $FREF_{DLY}$. In this case, both of the time stamps in the signal $FREF$ and the signal $FREF_{DLY}$ need to be stored and calculate the phase departure at the certain threshold voltage level. As illustrated in Figure 5-34. Figure 5-35 the measured DTC time resolution and functionality at the supply voltage $1V$ are plotted in the figure. The achieved DTC resolution is $20ps$, which agrees well with the simulated value ($26ps$ at $0.9V$ supply voltage and $19ps$ at $1V$). The time resolution of DTC is determined by the digital self-loaded buffer delay, which is sensitive to the supply voltage. Moreover, although the DC current is in tens of $\mu A$, the transient current is heavy
during the transition of the input signal FREF edges. As such, the on-chip local decoupling capacitors on the DTC power are employed to help reduce the reference spurs.

5-4-3 Test Plan of the Core TDC

In the all-digital PLL, TDC is one of the critical parts as its functionality determines the correctly tuning of DCO, the time resolution determine the border of the in-band phase noise, and the mismatch directly introduces the unexpected fractional spur tones at the output of ADPLL if without any specially calibration techniques.

Open loop test of TDC is carried out in this thesis. The floor plan of TDC measurement is illustrated in the figure above. Investigating the literatures of how to do the measurement of TDC, two different methods are reported: one is fed into the two signals with fixed phase departure manually each time, and the other is to generate two signals whose frequencies are slightly different so that the phase departure is automatically rotated between 0 and 2π. The first method is implemented in the on-chip circuit level, see in Figure 5-36.

A series of D-Flip-flop is cascaded for synchronized. Two separated path of signals are extracted at the third D-flip-flop as the input signals of TDC. The input data signal D of the third D-flip-flop is followed by a large size of buffer for a certain delay τ. And the output signal Q is the result sampling the signal D at the frequency of EXTCLK. Theoretically the rising edge of EXTCKG lags the FREFin with the sum of one period of EXTCLK and the data propagation time of D-flip-flop EXTCLK + Δt_{dff}. If the delay for the path of FREFin was fine tuned to match the delay in the EXTCKG path, the rising edge of both TDC input signals are well aligned. Thus, once tiny frequency Δf (meets \( \Delta t = \frac{1}{f_0 - \frac{f_0 + \Delta f}{2}} \)) change of the input signal EXTCLK occurs, the rising edge of the signal FREFin departs from the EXTCKG with the accurate phase difference \( \Delta t \). The advantage of this method is obvious that fine time resolution is controlled precisely, i.e if \( f_0 = 50MHz, \Delta f = 10KHz, then \( \Delta t = 3.9ps(\frac{\Delta f}{f_0(fo+\Delta f)}) \approx \frac{\Delta f}{f_0^2} \)), which indicates that the smaller frequency tuning \( \Delta f \) or the larger foundation frequency \( f_0 \) leads finer time resolution.

Figure 5-33: Linearity of DTC based on the measured variance of duty cycle
5-4 Measurement of DTC and TDC

Figure 5-34: Transfer function of DTC: phase departure vs input bins

Figure 5-35: Linearity of DTC based on the measured variance of phase departure

Figure 5-36: The on-chip circuits for TDC measurement
To characterize the TDC functionality, sweep the frequency $EXT_{CLK}$ from the frequency $f_0$ to the frequency $f_1 = f_0 + D.R \times f_0^2$, where D.R denotes the dynamic range of TDC. In this design, the time resolution of TDC is 16ps at 1V supply voltage. For the time difference tuning step size as fine as 0.01LSB, the tuning frequency step size is as small as 400Hz, which is easily accomplished by the signal generator. The density and the transition of every TDC output bin are recorded when sweeping the frequency of $EXT_{CKV}$.

Another open loop of measurement is done with rotating the phase departure automatically. The two input signals of TDC are synchronized, so that once the rising edges of two signals are triggered at the same time, the phase difference in the following cycles is accumulated and presents to cyclical increments. The step size of phase difference tuning is dependent on the frequency deviation of the two signals, the same with the case described before. The benefit derived from this method is that, no extra-on chip setting is needed. The setup for it is illustrated in Figure 5-37. In the first statistical tuning the phase difference, the samples in each quantization level of TDC is determined by the frequency tuning step size. In the second phase rotation method, the samples are not only related to the frequency difference between two TDC input signals but also dependent on the sampling rate of the digital oscillator. The sampling bins observed through the digital oscillator follows the equations as see below:

$$
\Delta f = f_0 - f_1 \\
\Delta \phi = \frac{1}{f_0} - \frac{1}{f_0 + \Delta f} \\
\text{win}_{\_TDC\_len} = \frac{D.R}{\Delta \phi} \times \frac{1}{f_0} \\
\text{win}_{\_rotation\_len} = \frac{1}{f_0 \Delta \phi} \times \frac{1}{f_0} = \frac{1}{f_0^2 \Delta \phi} \\
\text{win}_{\_TDC\_Q} = \text{win}_{\_TDC\_len} \times Sa \\
\text{win}_{\_TDC\_bins} = \text{floor} (\text{win}_{\_TDC\_Q})
$$

Where $f_0$ and $f_1$ indicate the input signals of TDC: $FREF_{in}$ and $EXT_{CKV}$, so $\Delta f$ is the...
frequency difference; the time difference per cycle of the signal $FREF_{in}$ and $EXT_{CKV}$ is illustrated with $\Delta \Phi$ and the detection range of the TDC or the dynamic range of the TDC is labeled as D.R. Consequently, the duration that covers all the cycles of the input signal $FREF_{in}$ so as to align the rising edge of two signals again is presented as $\text{win\_rotation\_len}$, which is periodical. Within the time window $\text{win\_rotation\_len}$, the effective time duration for TDC that the detection is not saturated is indicated as $\text{win\_TDC\_len}$. The output of TDC is further sampled at the $Sa$ bit/s as, within the time window length of $\text{win\_TDC\_Q}$.

When the TDC is active to detect the phase difference between its intrinsic input signals, one can define the time window as the active time window. So, outside the active time window, the TDC output is saturated in either the negative direction or the positive direction, that is, either the minimum value or the maximum value. The 4-bit binary code waveform is exhibited in the format of approximately 50% duty square wave, see in Figure 5-38. The TDC codes is two’s complement, and the results is verified correctly as ‘0111’ in binary (7 in decimal) and ‘1000’ in binary (−8 in decimal), shot as Figure 5-39. From the shot picture of the TDC outputs zoomed out, the periodic waveform is verified and the effective detection of the TDC occurs at the transition from ‘1000’ to ‘0111’. To further estimate the functionality of the TDC, post-process of the 4–bit raw data is done in Matlab, to convert the analog voltage level to the digital logic and the binary codes to the signed decimal value. The result is illustrated in Figure 5-40. The measured result curve proves the linear transfer function of
Figure 5-40: Transfer function of TDC

<table>
<thead>
<tr>
<th>Blocks</th>
<th>Supply voltage</th>
<th>Average Current</th>
<th>Time resolution</th>
<th>Dynamic Range</th>
</tr>
</thead>
<tbody>
<tr>
<td>DTC</td>
<td>0.9V</td>
<td>10.5µA</td>
<td>26ps</td>
<td>1.7ns</td>
</tr>
<tr>
<td>TDC</td>
<td>0.9V</td>
<td>22.48µA</td>
<td>17ps</td>
<td>268ns</td>
</tr>
</tbody>
</table>

Table 5-3: Simulated current dissipation of the post-layout extraction circuit: DTC and TDC

<table>
<thead>
<tr>
<th>Blocks</th>
<th>Supply voltage</th>
<th>Average Current</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clock Gating</td>
<td>0.9V</td>
<td>6.6µA</td>
</tr>
<tr>
<td>Clock buffer</td>
<td>0.9V</td>
<td>7.6µA</td>
</tr>
</tbody>
</table>

Table 5-4: Simulated current dissipation of the post-layout extraction circuit: clock buffer and clock gating

TDC. The slope responses to the frequency difference \( \Delta f \) of two input signals. In theoretical, if the digital oscillator is triggered at the external clock source at the frequency of \( \Delta f \), the triggered initial sample for the TDC effective detection is the same every rotating cycle. In practice, the jitter performance of the instruments exerts the influences of the synchronization and the active time window is shifted occasionally and unpredictably. It makes challenges for the averaging of the measured TDC output bins and estimating the linear performance of the TDC.

5-4-4 DC Power

The primary design goal for the pair of TDC and DTC is to achieve an ultra-low power dissipation under the typical system requirements. The DC power dissipation is evaluated and it turns out to break the limit of the previously reported results in the literature, which reports up to several micron-amperes. Note that, the average current of the DTC module is positive proportion to the selection codes, 10.5µA is the average value when sweeping full range codes. Similarly, the average current of TDC is positive proportion to the phase error
of \( CKV \) and \( REF_{DLY} \) or the digital value of TDC outputs; 22.48\( \mu \)s is the value when the TDC output is the maximum positive value.

The DC power dissipation of the chip is 43\( \mu \)W at the supply voltage of 1V, and it is reasonable and matches the design goals. See the pie diagram Figure 5-41 of the entire system of all-digital PLL, the DTC and TDC blocks occupy incredibly tiny portion comparing with other power-heavy blocks.
Bibliography


Bindi Wang  Master of Science Thesis


rf-dac employing pulsewidth modulation,” *Circuits and Systems I: Regular Papers, IEEE
Transactions on*, vol. 58, no. 11, pp. 2590–2603, 2011.

by-1.25 with tdc-based all-digital spur calibration in 45-nm cmos,” *Solid-State Circuits,

[45] A. Liscidini, L. Vercesi, and R. Castello, “Time to digital converter based on a 2-
dimensions vernier architecture,” in *Custom Integrated Circuits Conference, 2009. CICC

[46] L. Ping, A. Liscidini, and P. Andreani, “A 3.6 mw, 90 nm cmos gated-vernier time-
to-digital converter with an equivalent resolution of 3.2 ps,” *Solid-State Circuits, IEEE

and P. HoJin, “A 0.026nm 5.3mw 32-to-2000mhz digital fractional-n phase locked-loop
using a phase-interpolating phase-to-digital converter,” in *Solid-State Circuits Conference

low-phase-noise all-digital frequency synthesizer with a time-windowed time-to-digital