# Design of a pipelined time-to-digital Converter (TDC) suitable for transducer array channel multiplexing

#### Thesis

submitted in partial fulfillment of the requirements for the degree of

Master of Science

in

**Electrical Engineering** 

by

Lichao Wu born in Baoji, China

**Bioelectronics Group** 

**Department of Microelectronics** 

 $Faculty \, of \, Electrical \, Engineering, \, Mathematics \, and \, Computer \, Science \,$ 

 $Delft\,University\,of\,Technology$ 

## Delft University of Technology Department of Microelectronics

The undersigned hereby certify that they have read and recommend to the Faculty of Electrical Engineering, Mathematics and Computer Science for acceptance a thesis entitled "Design of a pipelined time-to-digital Converter (TDC) suitable for transducer array channel multiplexing" by Lichao Wu in partial fulfillment of the requirements for the degree of Master of Science.

| Dated: September 2017 |                                 |
|-----------------------|---------------------------------|
| Chairman:             | Prof. dr. ir. Wouter A. Serdijn |
| Committee:            | Ir. E. Emil Totev               |
|                       | Ir. E. Sotir Ouzounov           |
|                       | Dr.ir. M.A.P. Pertijs           |

### **Abstract**

Many fields need high performance time measurements, including particle detection, time-resolved imaging, transducer array channel multiplexing and other time-of-flight systems. These measurements are often performed by means of time-to-digital converters(TDCs), that for these applications, require high resolution, accuracy, and throughput. This is often accomplished using conventional custom circuitry, which entails low performance and low flexibility. High performance and low cost TDC architectures have been studied to meet the demands of time measurement.

This thesis presents a new pipelined TDC with 6.4ps resolution, 1.5LSB integral non-linearity, and a throughput of 200MS/s. A characterization of the TDC is executed and many influences on performance are described, including transistor distortions, temperature effects and process variation. Some directions for future work are presented, with the possibility to improve pipeline TDCs even more. The results show that pipeline TDC shows the standout Figure of Merit(FOM) of 0.01, and can be used in a wide range of applications requiring high throughput and accurate time measurement.

## Acknowledgments

Research is an unique exploration. We always try to find solutions, which may not exist, to a question. For the topic of my thesis, it would be much harder to continue research alone excluding other ones' self-giving help. Many teachers and friends have helped with this piece of work.

First and foremost, I would like to express my sincere gratitude to my academic advisor at the Delft University of Technology, Prof. dr. ir. Wouter A. Serdijn. It has been an honor to be his MSc student to work on an ultrasound project. His experience and knowledge in circuit design and ultrasound system added a great deal to my learning in the past nine months. His strong vision always inspired me to achieve better, and his continuous motivation boosted my confidence significantly. Under his guidance, I have learned to be meticulous and have rational thinking in my work. I will always remember your patience and understanding.

I would like to express my deep appreciation to the people in Philips. First, my supervisor, Ir. E. Emil Totev, a strict person who always push me forward to get the best result. He has taught me from scratch about ultrasound and TDC design and was always willing to support, not only in technical, but in the plan of life. He spent his priceless time in reading my final thesis carefully and offered several contribution advice, which I was really appreciate for Ir. E. Sotir Ouzounov, a knowledgeable and kind person who I would also express my thanks to. He inspired me to the future in the ultrasound system and always gives me precise and bold ideas significantly in every stage of my project. Mr. Emil Totev and Mr. Sotir Ouzounov were patient with me and my work, without their invaluable guidance, kindness, and support throughout my project, this work would not possible to be finished.

My thanks are also extended to my office colleagues, Joost Fijn, Mattia Bergaglio, Yuchen Ni, etc. A new understanding of technology and life was obtained from the discussion with them. Furthermore, they offered me a great help in the software usage and the ultrasound system overview. Both of them finished their thesis earlier than me. I hope everything goes well with them in the future.

One person whom I cannot thank more is my girlfriend, Fengqiao Zhang, for her kindness and patience to me in my project as well as the life. There were several tough times during this nine-month work, the great help from her encouraged me to overcome all the difficulties and finished the project successfully.

Finally, I sincerely thank my parent for their unquestioned support, for all the sacrifices they made, their words of inspiration and their belief that I could excel. I dedicate this work to them, without whom I would not be where I am today.

Special thanks to communication software like Skype and Wechat - for they allowed remaining in touch with all the people mentioned above and reduced distances to a great extent; even they stop working some time.

Lichao Wu Eindhoven, July 2017

## Contents

| 1 | Intr | roduction                                                             | 1  |
|---|------|-----------------------------------------------------------------------|----|
|   | 1.1  | Moving From Voltage Domain to Time Domain                             | 2  |
|   | 1.2  | The TDC Applications                                                  | 2  |
|   |      | 1.2.1 Particle Detector                                               | 2  |
|   |      | 1.2.2 Time-Resolved Imaging                                           | 3  |
|   |      | 1.2.3 Channel Multiplexing in the Ultrasound System                   | 3  |
|   | 1.3  | Motivation and Objectives                                             | 5  |
|   | 1.4  | Main Contributions                                                    | 5  |
|   | 1.5  | Outline                                                               | 6  |
|   | Refe | erences                                                               | 7  |
| 2 | TDC  | C Fundamental                                                         | 9  |
| _ |      | Specifications                                                        | _  |
|   | 4.1  | 2.1.1 Resolution                                                      |    |
|   |      | 2.1.2 Non-linearity: <i>DNL</i> : Differential, <i>INL</i> : Integral |    |
|   |      | 2.1.3 Conversion Time                                                 |    |
|   |      | 2.1.4 <i>FSR</i> : Full Scale Range                                   |    |
|   |      | 2.1.5 <i>ENOB</i> : Effective Number of Bit                           |    |
|   |      | 2.1.6 <i>FOM</i> : Figure of Merit                                    |    |
|   | 2.2  | TDC Architectures Overview                                            |    |
|   |      | 2.2.1 Analog TDC                                                      |    |
|   |      | 2.2.2 Digital TDCs                                                    |    |
|   |      | 2.2.3 Further Development: Resolution                                 |    |
|   |      | 2.2.4 Further Development: Dynamic Range                              |    |
|   | 2.3  | Summary                                                               |    |
|   |      | erences                                                               |    |
| 3 | Dine | eline TDC Architecture                                                | 23 |
| 0 |      | From Pipeline ADC to Pipeline TDC                                     |    |
|   |      | Residue Time Generation                                               |    |
|   | 0.2  | 3.2.1 Operation Principle                                             |    |
|   |      | 3.2.2 Example                                                         |    |
|   | 3.3  | Pipeline TDC Algorithm                                                |    |
|   | 3.4  | 1.5-bit Pipeline Stage and Digital Correction                         |    |
|   |      | Offset in Each Block                                                  |    |
|   | 0.0  | 3.5.1 TDC Offset                                                      |    |
|   |      | 3.5.2 DTC Offset                                                      |    |
|   |      |                                                                       | 0. |
|   |      | 3.5.3 Subtractor Offset                                               | 38 |

4 Contents

|   | 3.6        | Summary                               | 39        |
|---|------------|---------------------------------------|-----------|
|   | Refe       | erences                               | łO        |
| 4 | Pipe       | eline TDC Implementation              | <b>-1</b> |
|   | _          | First Stage                           | 12        |
|   |            |                                       | 12        |
|   |            | 4.1.2 TDC: Ring Oscillator            |           |
|   |            |                                       |           |
|   |            | 4.1.3 DTC: Selector                   |           |
|   |            | 4.1.4 Time Amplifier                  |           |
|   |            | 4.1.5 Thermometer-to-Binary Encoder   |           |
|   |            | Subsequent Stages 5                   |           |
|   | 4.3        | Summary                               | 55        |
|   | Refe       | rences 5                              | 58        |
| 5 | TDC        | 2's Simulation Result                 | 59        |
|   |            | Simulation Result                     | 50        |
|   | 0.1        | 5.1.1 Gated Delay Element Mismatch 6  |           |
|   |            | 5.1.2 Time Amplifier Mismatch         |           |
|   | <b>5</b> 0 |                                       |           |
|   | 3.4        | TDC Performance                       |           |
|   |            | 5.2.1 Resolution                      |           |
|   |            | 5.2.2 Linearity                       |           |
|   |            | 5.2.3 Accuracy                        |           |
|   |            | 5.2.4 Power Consumption               |           |
|   |            | 5.2.5 Robustness                      | 55        |
| 6 | Con        | clusion                               | 57        |
|   |            | Conclusion                            | -         |
|   |            | Comparison with state-of-the-art TDCs |           |
|   |            |                                       |           |
|   |            | Future Work                           |           |
|   | Kele       | rences                                | U         |

1

## Introduction

A thousand miles begins with a single step.

Laozi

In this introductory chapter, a brief overview of an TDC applications is given, followed by the goal and motivation of the thesis. Finally, the organization of the thesis is presented.

2 1. Introduction

#### **1.1.** Moving From Voltage Domain to Time Domain

The analog signal, which has a continuous-time and continuous-amplitude value, is the most common signal in the real world. To process the analog signal by digital computers, the digital signal, which has a discrete-time and discrete-amplitude value, has been proposed and adopted by scientists and engineers.

The two main components used for connecting between the analog and digital domain are Analog-to-Digital Converter (ADC) and Digital to Analog Converter (DAC). For an ADC, there are two steps should be conducted when converting the information from an analog value to a digital value. The first step is a sampling process, which can be seen as the discretization in the time domain and is realized by a sample and holds circuit. The second step is a quantization process, which is commonly accomplished by comparators, discretizing the sampled information in the amplitude domain. Once the quantization process is finished, the sampled-data signal is converted to the digital domain. After the analog signal is digitized, the digital signal processing (DSP) processor will calculating the digital data further to obtain the desired results. Finally, an DAC converts the fruit of a DSP processor back to the analog domain.

As a main contributor in many applications, high performance ADCs is required to enhance the overall performance of the system. The old technologies with ADCs take advantage of a large supply voltage. However, for applications in a deep-submicron CMOS process with low supply voltage, the available voltage headroom is quite small. Therefore, a signal representation in the time domain will be more impressive. Furthermore, in a deep-submicron CMOS process, the time-domain resolution of a digital signal edge transition is superior to the voltage resolution of an analog signal [1].

Similar with ADC in many aspects, Time to digital converter (TDC) is one of the fundamental elements to measure the time interval and then transfer it to a digital signal. The TDC output consists of a stream of binary code, which can only take on two abstract values, 'zero' and 'one' are easy to be processed by microcontrollers and processors. which has been widely used was first used in nuclear science in 1970s [2] [3], then extended to the application of digital storage oscillators [4] [5], laser range finders [6] and digital frequency synthesizers [7]. A brief introduction of some applications is given in the next section.

#### 1.2. The TDC Applications

#### **1.2.1.** Particle Detector

A particle detector, also known as a radiation detector, is a device used to detect, track, and identify ionizing particles in experimental and applied particle physics, nuclear physics, and nuclear engineering [8]. The TDC based particle detectors typically consisting of a photodiode, front-end electronics including a preamplifier, a shaper, a discriminator, followed by a charge-to-time converter and a TDC [9].

Figure 1.1 shows the readout data processed by flash ADC (a) and TDC (b) with the same analog input, indicating the working principle of the TDC based particle detector. After the input processed by the front-end electronics, instead of collecting



Figure 1.1: Readout Data (a) with flash-ADC (b) with TDC based electronic [10].

information on many data samples as a flash-ADC does, in the TDC based readout electronics one needs to record only two values: the arrival time of the signal and the output pulse duration of the charge-to-time converter [10]. The duration of a pulse on a digital output of the charge-to-time converter represents, with some accuracy, charge of the signal on the analog input.

#### **1.2.2.** Time-Resolved Imaging

Time-resolved imaging has been a rapidly growing field of investigation in recent years as it offers several advantages over traditional intensity imaging. In machine vision, time-resolved imaging allows constructing the depth map of a scene [11]. In life sciences, time-resolved imaging has enabled the emergence of fluorescence lifetime imaging, a quantitative imaging method to locally probe the chemical environment of a fluorophore in living cells [12].

Over the different approaches to time-resolved imaging, the use of Single-Photon Avalanche Diodes (SPAD) emerges as a feasible alternative for applications in low light conditions [13]. Figure 1.2 shows a typical arrangement for depth map estimation using SPADs by sensing a pulsed modulated light. This approach is based on the measurement of the time required by a photon to travel from the transmitter towards a target and back to the detector, realized by TDC measuring the time position of  $T_{stop}$  and  $T_{start}$ .

#### 1.2.3. Channel Multiplexing in the Ultrasound System

Ultrasound refers to the frequencies that are greater than 20kHz, which is the upper limit the human ear can hear. Medical ultrasound system typically operates in the 2 MHz to 20 MHz frequency range. By sending the sound wave to the target area, then receiving and processing the echo signal, the ultrasound system can observe the things inside the object.

In the real-time catheter-based 3-D ultrasound imaging applications, which have the strict restriction to the catheter diameter as well as the cable count, gathering



Figure 1.2: Principle of the ToF Measurement Based on Pulsed Modulation [10].

data from the transducer array by direct connection is difficult, especially in the case of small-sized catheters for intracardiac echography (ICE). Channel multiplexing is one of the solutions to decrease the cable count and maintain the signal integrity; the digital output is less prone to noise than transmitting the observed echo responses directly from each channel [14]. Channel multiplexing has been realized by several approaches, such as delay-sum beamforming (u-beamforming) [15–17], which apply an analog delay chain in the transducer to reduce the cable number; frequency division multiplexing (FDM), which uses analog modulation (AM) to multiplex multiple signals onto each cable and allowing all of the raw data to be transferred by making better use of the channel bandwidth [18], and time division multiplexing (TDM) [14].



Figure 1.3: Analog TDM Scheme. [14]

A typical structure of a system using TDM is shown in Figure 1.3. The analog front end (AFE) of the receiver consists primarily of a low-noise amplifier (LNA); an anti-aliasing filter (AAF); a time-gain compensation (TGC) Amplifier, which com-

pensates for attenuation of the return signal by body tissues as a function of time (as a proxy for depth), followed by sample and hold (S/H) logic and multiplexer (Mux). After the acquired signal going through the TGC, S/H logics would sample the signal in each channel sequentially and multiplex into one channel by the MUX. The output of the MUX is further amplified, digitized, then reconstructed by the FPGA. Finally, the image is formed.

The multiplexing is controlled by digital circuitry: counter (TDC) and subsequent sequencing logic, which generates the sample clocks for each channel and control signals for the multiplexer. Instead of generating binary code, the counter produces gray code to prevent spurious output from electromechanical switches. The gray code is sent to sequencing logic to select the input channel and enable the corresponding S/H logic. Once the signal has been sampled, it is transmitted to the multiplexer, then goes through a coax cable for further signal processing.

With the help of the TDM architecture, the signal in each channel is discretized by the counter (TDC), the power consumption is decreased dramatically by using a single ADC; at the same time, the size of the transducer is reduced because of the fewer implementation of the components in each channel. However, the conversion rate per stage is limited by the external clock frequency (250MHz in [14]).

#### **1.3.** Motivation and Objectives

The discussion on the different applications indicates the widespread use of TDC. The motivation of this work is the need for a high conversion rate, high precision, and low power TDC as one of the major contributors to the specific system.

In this thesis, a 14-bit high-performance time-to-digital converter with pipeline architecture is carried out to meet the requirements of a particular application, and the following design steps have been followed:

- An extensive literature survey of the existing work on TDC;
- Understand the requirements and find out the suitable type of TDC;
- System and circuit Level Implementation;
- Design verification.

#### 1.4. Main Contributions

This thesis has the following contributions:

- Introduce a new type of the pipeline TDC;
- Build a low cost and low power TDC system to achieve the required effective number of bits (ENOB) larger than 10bits with a resolution lower than 10ps;
- Present the detailed analysis of TDC as one of the major contributors to the specific system.

6 1. Introduction

1

#### 1.5. Outline

The thesis will be organized as follows.

Chapter 1 introduces the background knowledge, shows the motivation behind this thesis work, related work, main contributions, and outline.

- Chapter 2 evaluates the specifications of the TDC, demonstrates the overview of the TDC architectures.
- Chapter 3 discusses the architectures and algorithm of the developed pipeline TDC, explain how various error sources aspect TDC's static and dynamic performance.
- Chapter 4 introduces the building blocks of the developed TDC, which includes ring oscillator, selector, time amplifier and encoder.
- Chapter 5 shows simulation results and TDC performance.
- Chapter 6 concludes the thesis and identifies future work.

7

#### References

- [1] R. Staszewski and P. Balsara, *All-digital frequency synthesizer in deep-submicron cmos.* .
- [2] T.Yoshiaki and A.Takeshi., *Simple voltage-to-time converter with high linearity.* IEEE Transaction on Instrumentation and Measurement **20**, 120 (1971).
- [3] D. Porat., Review of sub-nanosecond time interval measurement. IEEE Transaction on Nuclear Science **NS-20**, 36 (1973).
- [4] K.Park and J.Park., 20ps resolution time-to-digital converter for digital storage oscillator. Proceedings of IEEE Nuclear Science Symposium 2, 876 (1998).
- [5] P.Chen, C.Chen, and Y.Shen., *A low-cost low-power cmos time-to-digital converter based on pulse stretching.* IEEE Transaction on Nuclear Science **4**, 2215 (2006).
- [6] C.Chen, P.Chen, C.Hwang, and W.Chang, *A precise cyclic cmos time-to-digital converter with low thermal sensitivity.* IEEE Transaction on Nuclear Science **4**, 834 (2005).
- [7] P.Chen, C.Chen, C.Tsai, , and W.Lu, *A time-to-digital-converter-based cmos smart tempreture sensor.* IEEE Journal of Solid-State Circuit **8**, 1642 (2005).
- [8] F. Borghetti, M. Gobbi, and A. Fornasari, *A particle detector fully-programmable interface circuit for satellite applications,* Circuits and Systems. (2003).
- [9] Y. Fujita, *Test of charge-to-time conversion and multi-hit tdc technique for the belle cdc readout.* Nucl. Instrum. Meth. A, **405**, 105 (1998).
- [10] A. Kaukher, A study of readout electronics based on tdc for the international linear collider tpc detector. IEEE TRANSACTIONS ON NUCLEAR SCIENCE **53**, 749 (2006).
- [11] M. G. et al, A parallel 32x32 time-to-digital converter array fabricated in a 130 nm imaging cmos technology, ESSCIRC 2009 34th European Solid-State Circuits Conference (2011).
- [12] P. Kumar, *Time-of-flight 3d imaging based on a spad-tdc pixel array in standard 65 nm cmos technology*, Delft University of Technology .
- [13] C. Niclass, A. Rochas, P.-A. Besse, , and E. Charbon, *Design and characterization of a cmos 3-d image sensor based on single photon avalanche diodes,* J. of Solid-State Circ **40**, 1847 (2005).
- [14] T. M. Carpenter, M. W. Rashid, and M. Ghovanloo, *Direct digital demultiplexing of analog tdm signals for cable reduction in ultrasound imaging catheters,* IEEE Trans. Ultrason., Ferroelectr., Freq. Control **63** (2016).

- [15] T. K. Song and J. F. Greenleaf, *Ultrasonic dynamic focusing using an analog fifo and asynchronous sampling,* IEEE Trans. Ultrason., Ferroelectr., Freq. Control **41**, 326 (1994).
- [16] B. Stefanelli, I. O'Connor, L. Quiquerez, A. Kaiser, and D. Billet, *An analog beam-forming circuit for ultrasound imaging using switched-current delay lines,* IEEE J. Solid-State Circuits **35**, 202 (2000).
- [17] Z. Yu, M. A. P. Pertijs, and G. C. M. Meijer, *Ultrasound beamformer using pipeline-operated s/h delay stages and charge-mode summation,* Electron. Lett. **47**, 1011 (2011).
- [18] M. W. Rashid, C. Tekes, M. Ghovanloo, and F. L. Degertekin, *Design of frequency-division multiplexing front-end receiver electronics for cmut-on-cmos based intracardiac echocardiography,* Proc. IEEE Int. Ultrason. Symp. (IUS), 1540 (2014).

## 2

## TDC Fundamental

Learning without thought is labor lost; Thought without learning is perilous.

Confucius

In this chapter, the static and dynamic parameters of TDC are given, followed by TDC architecture overview.

Before beginning to analyse the TDC architectures, some critical parameters are introduced for further evaluation proposes. Similar to the ADCs, the specifications of the TDCs are quantified by resolution, dynamic measured range, non-linearity, and conversion rate. For the overall performances, power dissipation, effective number of bits (ENOB) and figure-of-merit (FOM) should also be taken into account.

#### 2.1. Specifications

#### **2.1.1.** Resolution

The resolution of a TDC is equal to the minimum time interval a TDC can represent. In a real circuit design, the resolution is dependent on the circuit characteristics and noise performance. Assuming the measured time range is  $T_R$ , and the number of bits is N, the resolution is given as:

$$T_{LSB} = \frac{T_R}{2^N}. ag{2.1}$$

#### 2.1.2. Non-linearity: DNL: Differential, INL: Integral

In an ideal world, all TDC output codes would have an equal width, but in practice vary, as shown in Figure 2.1. Differential non-linearity DNL(i) is a vector that quantifies the deviation of each code i from the "average" width (step size), it is a measure of uniformity and does not depend on gain and offset errors. Scaling and shifting a transfer characteristic does not alter its homogeneity and hence DNL(i).



Figure 2.1: Transfer function of TDC with DNL. [1]

DNL(i) eugals to:

$$DNL(i) = \frac{T_{out}(i+1) - T_{out}(i)}{T_{LSB}} - 1,$$
 (2.2)

where  $T_{out}(i+1)$  and  $T_{out}(i)$  represent the width of the  $i^th$  and  $i+1^th$  step in the real transfer curve respectively.

DNL has some special characteristics which are as follows:

- Positive/negative DNL implies wide/narrow code, respectively;
- DNL = -1 LSB implies missing code;
- Impossible to have DNL < -1 LSB for an TDC. But possible to have DNL > +1 LSB;
- If DNL>1LSB, then there is possibility of the non-monotonic of the transfer curve.

*INL* can be defined as the deviation of the entire transfer function from the ideal function, which is shown in Figure 2.2. It could also be expressed as the summation of all the DNL, as given by:

$$INL(i) = \sum_{0}^{i-1} DNL(i).$$
 (2.3)



Figure 2.2: Transfer function of TDC with INL. [1]

#### **2.1.3.** Conversion Time

For high-speed applications, the conversion time is an essential parameter to evaluate the performance of a TDC; it equals to the maximum time consumption for the TDC to convert a single time interval to digital. If a TDC is driven by an external clock which frequency  $f_s$ , the conversion time could be calculated by:

$$ConversionTime = \frac{1}{f_s}. (2.4)$$

#### 2.1.4. FSR: Full Scale Range

The full scale range (FSR) is another parameter to estimate the performance of a TDC, which determines the maximum time interval one TDC can measure. If the number of the bits is N, the full scale range of the TDC is equal to:

$$FSR = T_{LSB} \cdot 2^N, \tag{2.5}$$

where  $T_{LSB}$  is the resolution of the TDC and N represents the number of the bits.

#### **2.1.5.** *ENOB*: Effective Number of Bit

In the ideal case, the ENOB is equal to the output bit of the TDC. However, all the real TDC implementations introduce noise and distortion which would affect the TDC resolution. ENOB here represents the effective resolution of the TDC [2], and is defined by:

$$ENOB = N - log_2(INL + 1). (2.6)$$

where N is the number of the bits, INL can be calculated by Equation (2.3) It is evident that ENOB is equaled to N only if a circuit has no noise, perfect clock and transfer function with zero INL and DNL, ; Any non-linearity can degrade the ENOB as well as the performance of TDC.

#### **2.1.6.** *FOM*: Figure of Merit

The figure of merit is a parameter used to measure the power effectiveness of a TDC [3]. Usually, publications or data-sheets use different definitions of the FOM. The basis of all these definitions is in Equation (2.7) referred to as the Walden, where smaller FOM means higher performance.

$$FOM = \frac{PowerDissipation}{f_s \cdot 2^{ENOB}} [pJ/step]. \tag{2.7}$$

#### 2.2. TDC Architectures Overview

There are several approaches to measure the time, from analog to digital, from single stage to multiple stages. Different architectures have their advantages and disadvantages. In this section, some modern TDC architectures are described and analyzed.

#### **2.2.1.** Analog TDC

The earliest time-to-digital converters had a close relation with the analog-to-digital converters (ADCs) and were introduced by Tanaka et al. (1991); Bigongiari et al. (1999); Napolitano et al. (2010). The TDC function is realized by first using a

time to amplitude converter (TAC) converting time into a voltage, then digitizing the voltage by the utilization of an ADC. Figure 2.3 shows the basic block diagram of an analog TDC.



Figure 2.3: Block and signal diagram of basic analog time-to-digital converter. [4]

Assuming the input time interval is  $T_{in}$  to be measured, the current supplied by the charge pump is  $I_{cp}$ , and the corresponding capacitor is  $C_c$ , the voltage ( $V_{TAC}$ ) generated by the TAC is:

$$V_{TAC} = \frac{I_{cp}}{C_c} \cdot T_{in}. \tag{2.8}$$

Subsequently,  $V_{TAC}$  is digitized by an ADC. Assuming the resolution of ADC is  $V_{LSB}$ , then  $V_{TAC}$  can be expressed by:

$$V_{TAC} = V_{LSB} \sum_{k=0}^{n-1} D_k \cdot 2^k, \tag{2.9}$$

where  $D_k$  is the digital output k of the ADC. Combining Equation (2.8) and Equation (2.9), we get:

$$T_{in} = \frac{C_c \cdot V_{LSB}}{I} \sum_{k=0}^{n-1} D_k \cdot 2^k.$$
 (2.10)

In Equation (2.10), the accuracy of TDC is mainly dependent on the resolution of the ADC. Generally, with a high-resolution ADC, high-precision TDCs can be obtained. However, the TAC and the ADC are mainly implemented by analog circuits which are not suitable for technology scaling [4]; The design of high-performance analog and mixed-signal circuits is relatively complicated; moreover, the analog circuit dissipates large static power consumption.

#### 2.2.2. Digital TDCs

#### Counter-Based TDCs

A counter-based TDC is the simplest digital TDC architecture. By measuring the number of clock pulses, the digital output could be generated synchronously. The basic architecture is shown in Figure 2.4.



Figure 2.4: Counter-Based TDC Architecture. [5]

An SR latch is used to generate the pulse which could enable the counter. When the output of the SR latch is high, the counter starts counting the clock pulse number. The clock frequency determines the resolution of the counter-based TDC. Assuming the counter has N bits, and the clock frequency is  $f_c$ , the full scale range (FSR) is given by:

$$FSR = \frac{1}{f_c} \cdot 2^N. \tag{2.11}$$

From Equation (2.11), the resolution is defined by  $\frac{1}{f_c}$ , higher clock frequency mains higher resolution, a high conversion rate could also be expected because of the simple logic. Furthermore, a wide FSR could be achieved by adding more bits to the counter. However, the counter-based TDC can not meet the requirement for the high resolution, if a picosecond resolution is required, the clock frequency would be hundred of gigahertz, which is hard to realize.

#### Delay-Line Based TDCs

The delay-line based TDC is also known as the flash TDC. Since it consists of digital gates which are easy to implement, this type of TDCs becomes one of the most commonly used digital TDC architectures. The TDC architecture is shown in Figure 2.5, the start signal propagates through a delay line and changes the input states of the flip-flops. Once the stop signal comes, the input states are stored by the flip-flops and sent to its output.

Although the external clock is not implemented in the flash TDC, the resolution is restricted by the delay time of the single delay cell. In the meantime, the *FSR*, which equals to the delay time of the delay line, is considerably less compared to the counter-based TDC.



Figure 2.5: Delay-line based TDC.

#### **2.2.3.** Further Development: Resolution

There are several architectures for increasing the resolution. The vernier delay line TDC, pulse shrinking TDC and time amplifier (TA) are introduced in this section.

#### Veriner Delay Line TDC

The idea of the vernier delay line TDC comes from vernier caliper, shown in Figure 2.6 (a). By introducing small mismatch into two scales, the measured result becomes more precise. The vernier delay line TDC is shown in Figure 2.6 (b), which consists of two mismatched delay lines with the delay time of  $T_1$  and  $T_2$  for the each delay cell. The start signal propagates through the delay cells with a long delay time  $T_1$  while the stop signal goes into the short one  $T_2$ . Every time the start and stop signal propagates through one delay cell, the time difference between these two signals decreases by (T1-T2). Once the stop signal catches the start signal, the input state of the flip-flops is held and the digital output is generated.

Given the working principle of the vernier delay line TDC, the resolution ( $T_{LSB}$ ) equals to the time mismatch between two gates and is given by:

$$T_{LSB} = T_1 - T_2. (2.12)$$

If each delay line has N gates, the full scale range (FSR) is:

$$FSR = T_{LSR} \cdot N. \tag{2.13}$$

From Equation (2.12) and Equation (2.13), the sub-gate resolution (the resolution smaller than the gate delay time) can be realized by vernier delay line TDC (5.7ps in [6]). However, the full scale range would decrease when the resolution increases; furthermore, as the conversion rate is equal to the resolution in this architecture, a higher resolution means a lower conversion rate.

#### Pulse Shrinking TDC

Th pulse shrinking TDC is similar to the veriner delay line TDC in some aspects, a simplified architecture of the pulse shrinking TDC is shown in Figure 2.7. By carefully designing the aspect ratio W/L of each CMOS delay element, the rise and fall times can be altered to differ from all others in the chain, and offer an incremental reduction in pulse width [7]. The width of the input pulse  $T_{in}$  decreases



Figure 2.6: Concept of Vernier Delay line TDC. [5]

whenever it propagates through one gate by  $\Delta T$ , the SR latches record the state of the each gate. When the pulse width is 0 or not detectable by the latch, the conversion process is finished.



Figure 2.7: Concept of Pulse Shrinking TDC. [8]

The resolution is determined by the decreased value  $\Delta T$ . Similar to the vernier delay line TDC, pulse shrinking TDC can achieve sub-gate delay resolution, but dynamic range and conversion rate are sacrificed.

#### Time Amplification

Similar to a voltage amplifier, a time amplifier amplifies the input time interval. A simplified pulse-train time amplifier is shown in Figure 2.8. Input time interval is delayed in each branch with different delay time, the delayed pulses are accumulated together by an OR gate and forms a pulse train. The gain of the time amplifier is defined by the output time divided by the input time. For the pulse-train time amplifier, since the input has one pulse and output has four pulses, the gain of the time amplifier is 4.



Figure 2.8: Pulse-Train Time Amplifer.

Assuming there is an input time interval with  $0.9T_{LSB}$  width, which is smaller than  $T_{LSB}$  and thus can not be measured by the TDC. With the time amplifier with the gain of 4, the input time interval is multiplied by 4, which is  $3.6T_{LSB}$  at the output and becomes measurable. The function of the time amplifier can be expressed by Equation (2.14).

$$T_{out} = T_{in} \cdot A,\tag{2.14}$$

where  $T_{out}$  and  $T_{in}$  are the output and input time respectively, and A represents the gain of the time amplifier.

Time amplifier is used in applications such as the precision measurement of the width of a narrow pulse which is too small to be quantized accurately [9], and widely implemented in multistage TDCs to increase the resolution. For high-speed digitization, the bandwidth of the time amplifier is also taken into consideration, which is discussed in charpter 4.

## **2.2.4.** Further Development: Dynamic Range Multistage TDC

A multistage TDC is the combination of two or more TDCs sequentially, which could be implemented with an arbitrary number of stages in theory. A multistage TDC consists of two parts: a coarse TDC which determines the dynamic range, measures the large time interval, and a fine TDC which determines the resolution, and measures the small time residue comes from the coarse TDC.

For instance, as is mentioned in Section 2.2.2, the counter-based TDC has a wide dynamic range, which is determined by the counter bits. To increase the dynamic range, a counter based TDC could be the first stage to measure the rough time, followed by a fine TDC with higher resolution to measure the time residue. Figure 2.9 shows a simple TDC signal for  $T_{coarse}$  and  $T_{fine}$ .

The coarse TDC would count the number of rising edges that equals to  $T_{coarse}$ , then the fine TDC measures  $T_{fine}$  which equals the time from the start signal to the next rising edge of the clock plus the duration of one clock cycle [10].  $T_{fine}$  is relatively short, ranging from one clock cycle to two clock cycles, therefore, the high resolution, low conversion rate TDC mentioned in the last section would be a good choice for the fine TDC.



Figure 2.9: TDC signals for time interval measurement [10]

Another way to implement a multistage TDC is shown in Figure 2.10 [11]. The delay units with different capacitor produce the delay times with different range. Comparing the initial input time interval with a phrase generated by 16C, 8C, 4C, 2C

and C sequentially, a fine resolution digital output can be produced.



Figure 2.10: Two Steps TDC with Fine TDC Controlled by Capacitors. [11]

Assuming an unknown time interval is measured by a TDC which is smaller than the fixed delay time generated by capacitors 16C and 8C. The time interval is then fed into 4C, subtracted by the 4C delay time. The produced residue time is sent to 2C. The unit of 2C generates the a residue time again, which is finally fed to C. The last residue time from the C will be sent to the flash TDC which has the same time range as the delay time of C and produces the digital output. The flash TDC determines the resolution of the whole TDC; the maximum time range is defined by the delay time of C.

#### Ring-Oscillator Based TDC

Instead of using an external clock for to measure the time, a TDC could generate its own clock by using a ring oscillator, shown in Figure 2.11. By using the odd numbers of the inverters, the oscillator is continuously changing its output state, the oscillation frequency is determined by the delay time of the ring.



Figure 2.11: Ring Oscillator Principle.

Assuming the delay time for a single inverter is  $T_d$ , the oscillating frequency  $f_{osc}$  for a three stage ring oscillator is:

$$f_{osc} = \frac{1}{3T_d}. (2.15)$$

Gated ring oscillator (GRO), similar to a ring oscillator, is developed for higher resolution as well as lower quantization noise. Compared with its counterpart, a GRO only operates when the 'enable signal' (refers to the 'measurement time interval' in Figure 2.12) is high and stops when it is low. In the meantime, the residue occurring at the end of a given measurement interval,  $T_{stop}[k-1]$ , is stored in the GRO, which can be transferred to the next measurement interval,  $T_{start}[k]$  [12], which is expressed by Equation (2.16).



Figure 2.12: Concept of the gated ring oscillator TDC [12].

$$T_{stop}[k-1] = T_{start}[k]. (2.16)$$

This feature can be utilized for continuous time interval measurements; the followed measurement starts at where it stopped in the previous measurement. The overall quantization error of the time interval measurement is given by:

$$T_{error} = T_{stop}[k] - T_{start}[k] = T_{stop}[k] - T_{stop}[k-1],$$
 (2.17)

where  $T_{start}$  and  $T_{stop}$  represent the start time and the stop time intervals, respectively. k is the measurements number. From Equation 2.17, the quantization error from the previous measurement is transferred to the next measurement, this characteristic is referred as first order noise shaping [5].

The resolution of the GRO could be smaller than 1ps with relatively lower power dissipation; first-order noise shaping characteristic provides the possibility of the further residue measurement. Therefore, GRO based TDC is widely used in high precision TDC architectures. However, metastability and jitters are the main issues for ring-oscillator based structures.

#### 2.3. Summary

In this chapter, the static specifications (resolution, INL, DNL, conversion time) and dynamic specification (FSR, ENOB, FOM) has been introduced; an overview of TDC architecture has been presented.

The performances of the TDC architectures discussed in this chapter have been compared via the qualitative analysis. The results are listed in Table. 2.13. The performances such as resolution, full-scale range (FSR), conversion rate, power dissipation, nonlinearity are considered. Besides, the design complexity are analyzed.

| Architecture       | Resolution | FSR   | Conversion | Power       | Nonlinearity | Complexity |
|--------------------|------------|-------|------------|-------------|--------------|------------|
|                    |            |       | Rate       | Dissipation |              |            |
| TAC+ADC            | ~50ps      | ++++  | +++        | +++++       | ++           | ++++       |
| Counter-based      | ∼ns        | +++++ | +++        | +           | ++++         | +          |
| Delay Line         | ~200ps     | ++    | ++         | ++          | +++          | ++         |
| Vernier Delay Line | ~10ps      | +     | +          | +++         | +++          | +++        |
| Pulse Shrinking    | ~10ps      | ++    | ++         | ++++        | ++++         | ++++       |
| Multistage         | ~1ps       | +++++ | +++        | ++++        | +++          | +++++      |
| GRO                | ~100fs     | +++   | +          | ++          | +++++        | +++        |
| Time Amplifier     | ~1ps       | +     | +++        | ++          | +            | ++++       |

Figure 2.13: Overall performances comparison of the TDC architectures. [5]

From the Table, every architecture has its advantages and disadvantages. For example, counter-based TDC has the largest FSR but lowest resolution; GRO has the best resolution; however, the non-linearity is a big issue. To design a high-performance TDC, it is a good option to combine different types of TDC together to get the best result, the pipeline TDC which consists of the counter, ring oscillator, and gated delay line is discussed in the next chapter.

#### References

- [1] S. Wang, A 10-bit 25msps pipeline adc for companding baseband processing in wireless application, Delft University of Technology (2009).
- [2] Y. J. Park and F. Yuan, 0.25-4 ns 185 ms/s 4-bit pulse-shrinking time-to-digital converter in 130 nm cmos using a 2-step conversion scheme, .
- [3] R. Walden, *Analog-to-digital converter survey and analysis*. Selected Areas in Communications, IEEE Journal **17**, 539 (1999).
- [4] J. Kalisz, *Review of methods for time interval measurements with picosecond resolution,* Institute of Physics Publishing, Metrologia 41, 17 (2004).
- [5] W. Gao, D. Gao, C. Hu-Guo, and Y. Hu, *Integrated high-resolution multi-channel time-to-digital converters (tdcs) for pet imaging,* Northwestern Polytechnical University (Institut Pluridisciplinaire Hubert Curien (UDS, CNRS/IN2P3)).
- [6] N. U. Andersson and M. Vesterbacka, *A vernier time-to-digital converter with delay latch chain architecture,* IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS **61**, 773 (2014).
- [7] P. Chen, S.-I. Liu, and J. Wu, *A cmos pulse-shrinking delay element for time interval measurement,* IEEE Trans. on Circuits and Systems , 954 (2000).
- [8] Y. J. Park and F. Yuan, 0.25-4 ns 185 ms/s 4-bit pulse-shrinking time-to-digital converter in 130 nm cmos using a 2-step conversion scheme, Ryerson University, Toronto, ON, Canada (Institut Pluridisciplinaire Hubert Curien (UDS, CNRS/IN2P3)).
- [9] F. Yuan, Cmos time-mode circuits and systems: Fundamentals and applications, CRC Press (2015).
- [10] B. K. Swann and B. J. Blalock, *A 100-ps time-resolution cmos time-to-digital converter for positron emission tomography imaging applications,* IEEE JOURNAL OF SOLID-STATE CIRCUITS **39**, 1839 (2004).
- [11] P. Chen and R. B. Staszewski, *Exponential extended flash time-to-digital converter,* University College Dublin, Dublin, Ireland .
- [12] M. Z. Straayer and M. H. Perrott, *A multi-path gated ring oscillator tdc with first-order noise shaping,* IEEE JOURNAL OF SOLID-STATE CIRCUITS **44**, 1089 (2009).

## Pipeline TDC Architecture

Without compasses and angle squares, no square or circle can ever be drawn.

Menci

In this chapter, the pipeline TDC algorithm and architecture are given, followed by the analysis of the error source in each TDC block.

#### **3.1.** From Pipeline ADC to Pipeline TDC

Resolution and conversion rate, two conflicting parameters, both are the priority in the TDC design. In the pipeline architecture, a high resolution can be achieved by increasing the number of stages, a high conversion rate can be realized by producing small number of bits in each stage. Therefore, the pipeline architecture can reach the high resolution without influencing the conversion rate, which becomes the significant advantage compared with other structures.

The Pipeline ADCs are widely used because of their high resolution and high conversion rate. As is shown in Figure 3.1, each pipeline stage typically consists of three blocks: a sub-ADC, a sub-digital to analog converter (DAC) and a subtractor, followed by a voltage amplifier.



Figure 3.1: Pipeline ADC Block Diagram.

When the input  $V_{in}$  is sent into the ADC, it is digitized by sub-ADC and becomes the digital output of this stage. Sub-DAC converts the digital code from sub-ADC back to the analog signal, compares with the original analog input by using a subtractor and generates a residue voltage  $V_{res}$ , which is amplified by an amplifier to the full range of the sub-ADC and sent to the next stage. Assuming the input voltage is  $V_{in}$ , the resolution for the DAC is  $V_{LSB}$  and each stage has N bits, the residue voltage of stagei is given by:

$$V_{res}[i] = V_{in} - V_{LSB} \sum_{k=0}^{N-1} D_k \cdot 2^k,$$
 (3.1)

where  $D_k$  is the digital output k of the ADC

To get the benefit from the pipeline ADC structure and implement it into a TDC, the first thing that needs to be considered is how to generate the residue time. Unlike a voltage value that can be stored by a capacitor temporarily, time in the voltage domain corresponds to a pulse with a specific width, which can not be held

directly; at the same time, generating the residue time directly by the conventional circuit is difficult. A new method is required to get the residue time, which is introduced In Section 3.2.

#### **3.2.** Residue Time Generation

#### **3.2.1.** Operation Principle

Before introducing the method, it is essential to know where the residue time comes from. The time interval in the voltage domain corresponds to a pulse with a specific width; its start and stop times are represented by the rising and falling edge. For many TDC architectures, the start time(rising edge) is the trigger signal for the TDC to start working, thus is error free and will not introduce any residue. However, because of the non-predictable characteristic, the falling edge, which could be anywhere in one clock cycle (counter-based TDC) or the delay time for a single delay element (delay-line based TDC). The time from the stop time to the end of clock cycle/delay time is called residue time ( $T_{res}$ ). Obviously, it is smaller than  $T_{LSB}$ .



Figure 3.2: Basic Delay Line.

To make it more clear, a delay line with six delay elements is shown in Figure 3.2. Assuming the delay time of each delay element is  $T_d$ , the input time  $T_{in}$  is larger than  $2T_d$  but smaller than  $3T_d$ . When the rising edge of the input time starts propagating through the delay line, it will propagate through the first two delay elements, and some part of the third stage before the falling edge is detected at the beginning of the delay line, which is shown in Figure 3.3. The activated blocks are marked in green; the rest remain in white.



Figure 3.3: State of Basic Delay Line After the Signal Propagation.

Since the rising edge of the  $T_{in}$  has not fully propagated through the third delay cell, the state of each delay cell in the delay line is 110000, indicating that the input time interval is  $2T_d$ . Therefore, the residue time  $T_{res}$  is given by:

$$T_{res} = T_{in} - T_d \cdot N, \tag{3.2}$$

where N is the number of the delay cell the rising edge has propagated through.

Based on Equation (3.2), the relation between  $T_{in}$  and  $T_{res}$  is shown in Figure 3.4, it should be noted that  $T_{res}$  in the delay line can not be acquired directly.



Figure 3.4: Transfer Curve:  $T_{in}$  verses  $T_{res}$ .

A gated delay line (GDL), whose core block is gated delay element as discussed in Section 2.2.4, is used to acquire the residue time. The simplified architecture is shown in Figure 3.5. Instead of propagating through the delay line directly,  $T_{in}$  controls the propagation of the signal in GDL. The trigger signal, which is the signal in the gated delay line, changes to high level when the rising edge of the  $T_{in}$  is detected, no residue is generated when the measurement begins.



Figure 3.5: Gated Delay Line Block Diagram.

Assuming a GDL has six gated delay elements with each delay time of  $T_d$ , the dynamic range is  $6T_d$ , the input time  $T_{in}$  is between  $2T_d$  and  $3T_d$ . After applying the  $T_{in}$  to GDL, the trigger signal propagates through the first two delay elements and some part of the third delay element. The state of each gated delay element is shown in Figure 3.6, the activated block color is marked in green; the rest remain in white. Same as the state of the delay line in Figure 3.3, the digital output of GDL is 110000 and corresponding residue time can be expressed by Equation (3.2). However, since the state of  $T_{in}$  change to low when the measurement is finished, the gated delay line has held its state, thus stored the residue time. With the help of the first order noise shaping characteristic, the residue time could be transferred through the GDL by sending another control pulse, which has a high possibility to

be measured.



Figure 3.6: State of GDL:  $T_{in}$  Propagation

To acquire the residue time, an additional enable time interval  $T_{en}$  which ends at the moment GDL digital output becomes 111111, is applied to the GDL. As is shown in Figure 3.7, the gated delay elements activated by  $T_{en}$  are marked in green.



Figure 3.7: State of GDL:  $T_{en}$  Propagation.

The length of the time interval  $T_{en}$  is:

$$T_{en} = FSR - T_{in} = 6T_d - T_{in}. (3.3)$$

Applying Equation (3.2), then we can get:

$$T_{en} = (6 - N)T_d - T_{res}. (3.4)$$

Since the output of of GDL is 110000, *N* equals to 2, thus:

$$T_{en} = 4T_d - T_{res}. (3.5)$$

In Equation (3.5),  $T_{en}$  becomes an expression of  $T_{res}$ , which is complement with  $T_{in}$  in the full scale range  $6T_d$ .



Figure 3.8: State 3 of GDL.

If we subtract the redundant value  $3T_d$  from  $T_{en}$ , shows in Figure 3.8, the  $T_{en}$  can be expressed by:

$$T_{en} = T_d - T_{res},\tag{3.6}$$

Based on Equation (3.6), since  $T_d$  is a constant, if the length of  $T_{en}$  is known,  $T_{res}$  can be acquired once  $T_{en}$  is generated. In this case, the initial parameter of the pipeline architecture, residue time, is generated, and the pipeline TDC can be implemented.

#### **3.2.2.** Example

To fully explain the residue time generation method, Figure 3.9 shows a demonstration of a 2-bit residue time generator.



Figure 3.9: Error Time Generator in Principle.

The trigger signal  $T_{tri}$  changes from low to high and propagates through the delay line when the input signal  $T_{in}$  enables the GDL, paused when the falling edge of  $T_{in}$  arrives. The clk signal triggers the D flip-flops (DFFs) and store the state of each delay element when the measurement is ended, which output results are D1,D2,D3 and D4, respectively.

Assuming  $T_{in}$  is in the range between  $2T_d$  and  $3T_d$ , the state of the GDL at the end of the measurement is indicated in Figure 3.10. The activated cells are marked in green, the digital outputs of four DFFs are 1100.



Figure 3.10: Error Time Generator:  $T_{in}$  propagation.

The digital outputs of the DFFs control the opening/closing of the switches S1,

S2, S3 and S4. Since the digital output is 1100, S2 is closed, the rest switches remain open, shown in Figure 3.11. Next, GDL is enabled by  $T_{en}$ ; the trigger signal starts to propagate again at the position where it stopped in the previous measurement, passes through switch S2 and finally becomes the output of the OR gate,  $T_{sel}$ . The residue time  $T_{res}$  equals to the time difference between the rising edge of  $T_{en}$ , and  $T_{sel}$ . All the components activated during  $T_{en}$  are marked in green.



Figure 3.11: Error Time Generator:  $T_{en}$  propagation.

The example of 2-bit residue time generator presents an overview of the residue time measurement.  $T_{in}$  is first digitized after it is applied to the GDL, then produce  $T_{res}$  by the time difference between  $T_{sel}$  and  $T_{en}$ . With this two-step measurement, residue time can be generated successfully.

#### **3.3.** Pipeline TDC Algorithm

Based on the discussion in the last section, we can summarize the operation process of residue time generator into two steps:

- During  $T_{in}$ : digitize the input signal and generates the digital output;
- During  $T_{en}$ : generate  $T_{sel}$  based on the digital output in step 1, then combines with  $T_{en}$  to produce the residue time.

The time digitization is achieved by step 1, while step 2 realizes the digital-to-time converter (DTC) and subtractor function. To describe the workflow more precisely, the stage architecture of pipeline TDC in the system level is shown in Figure 3.12. The TDC first digitizes the input time interval, then sends the output to a DTC that converts the digital input D back to time, which is subsequently subtracted from the input time interval stored in a time register (gated delay line) and composite the residue time. Finally, the residue time is amplified by the time amplifier and sent to the next stage.

Assuming the quantization noise is introduced to the TDC while other components are ideal, the stage architecture of the pipeline TDC can be modeled as shown in Figure 3.13. The relation between  $T_{out}$  and  $N_q$  is given in Equation (3.7).



Figure 3.12: Stage Architecture.



Figure 3.13: Pipeline TDC Stage Model with Ideal DTC.

$$T_{out1} = T_{in} + N_{q1}, (3.7)$$

where  $N_q$  represents the quantization noise of TDC,  $T_{out}$  is the output of ideal DTC. The residue of the pipeline TDC  $T_{res1}$  is given by Equation (3.8).

$$T_{res1} = -G \cdot N_{q1},\tag{3.8}$$

where *G* is the gain of time amplifier.

To model the overall pipeline TDC algorithm, as is shown in Figure 3.14, a back end TDC and DTC is added to constitute a two-stage TDC. The residue time  $T_{res1}$  from the previous stage is sent to the following TDC with the quantization noise  $N_{q2}$ , whose digital output passes through an ideal DTC with the bit weight of  $G_r$  and recovers back to the time. Equation (3.9) illustrates the expression of the two-stage pipeline DTC output.

$$T_{out} = T_{out1} + \frac{-G \cdot T_{res1} + N_{q2}}{G_r},$$
 (3.9)



Figure 3.14: Two-Stage TDC model with a single stage and a single backend TDC.

From equations (3.7) and (3.8),  $T_{out}$  can also be expressed by:

$$T_{out} = T_{in} + (1 - \frac{G}{G_r}) \cdot N_{q1} + \frac{N_{q2}}{G_r},$$
 (3.10)

From Equation (3.10), if the gain of the time amplifier G equals to the bit weight of DTC  $G_r$ , which indicates the back end TDC works in full scale range, the overall TDC quantization noise equals to the backend TDC divided by the DTC bit weight, shown in Equation (3.11).

$$T_{out} = T_{in} + \frac{N_{q2}}{G_r},\tag{3.11}$$

Figure 3.15 shows the pipeline TDC architecture with an arbitrary number of stages. Using the same equation mentioned above, the final output value can be calculated as follows:



Figure 3.15: Unlimited-Stage TDC model.

$$T_{out} = T_{in} + (1 - \frac{G}{G_r}) \cdot N_{q1} + (1 - \frac{G_2}{G_{r2}}) \cdot \frac{N_{q2}}{G_r} + \dots \\ (1 - \frac{G_{n-1}}{G_{r(n-1)}}) \cdot \frac{N_{q(n-1)}}{\prod_{i=1}^{n-2} \cdot G_{ri}} + \frac{N_{q(n)}}{\prod_{i=1}^{n-1} \cdot G_{ri}}.$$

$$(3.12)$$

From Equation (3.12), the quantization noise in the first stage influences the final output more than the last stage, the stringent precision at the beginning of the stages is required. Furthermore, the front-end stage has the highest sensitivity

to the noise. Since the algorithm condition is that each stage operates at the same full-scale range, the stage gain  $G_i$  equals to  $G_{ri}$ , Equation (3.12) can be simplified to:

$$T_{out} = T_{in} + \frac{N_{q(n)}}{\prod_{i=1}^{n-1} \cdot G_{ri}}.$$
(3.13)

Assuming all the stages have same stage gain G and same quantization error  $N_q$ , we get:

$$T_{out} = T_{in} + \frac{N_q}{G^{n-1}}. (3.14)$$

Based on Equation (3.14), a N-stage pipeline TDC is able to digitize the input time with the quantization error of  $\frac{N_q}{G^{n-1}}$ . In the meantime, output bit number  $N_{TDC}$  can be obtained in Equation (3.15).

$$N_{TDC} = \sum_{i=1}^{n} log_2 G,$$
 (3.15)

For example, for a 3-stage pipeline TDC with 2-bit resolution with a gain of 4 in each stage, the final output aperture is  $N_{TDC} = \sum_{i=1}^{3} log_2 4 = 6$  bit. Assuming the delay time of the delay cell is 30ps and the gain of time amplifier is 4, the range of the residue time is from 0ps to 120ps, four delay elements are used to match full-scale range, and the binary output of each stage is from 00 to 11. The operation process of the pipeline TDC is shown in Table 3.16.

|                                                         | Stage 1                         |                                            | Sta                                                              | ge 2                       | Stage 3                                                          |                            |  |
|---------------------------------------------------------|---------------------------------|--------------------------------------------|------------------------------------------------------------------|----------------------------|------------------------------------------------------------------|----------------------------|--|
| Input Time<br>(ps)                                      | TDC1   DTC1<br>(Time   Digital) | Time<br>Amplifier 1                        | TDC2 DTC2 (Time   Digital)                                       | Time<br>Amplifier 2        | TDC3   DTC3<br>(Time   Digital)                                  | Time<br>Amplifier 3        |  |
| 30*n <sub>0</sub> <t<30<br>*(n<sub>0</sub>+1)</t<30<br> | $T_{res1}$ =30*( $n_0$ +1)-T    | A <sub>res1</sub> =T <sub>err1</sub><br>*4 | T <sub>res2</sub> =30*(<br>n <sub>1</sub> +1)- A <sub>res1</sub> | $A_{res2}$ = $T_{res2}$ *4 | T <sub>res3</sub> =30*(n <sub>2</sub> +1<br>)- A <sub>res2</sub> | $A_{res3}$ = $T_{res3}$ *4 |  |
| 61                                                      | 29   10                         | 116                                        | 4   11                                                           | 16                         | 14   00                                                          | 56                         |  |
| 62                                                      | 28   10                         | 112                                        | 8   11                                                           | 32                         | 28   01                                                          | 112                        |  |
| 63                                                      | 27   10                         | 108                                        | 12   11                                                          | 48                         | 12   01                                                          | 48                         |  |
| 64                                                      | 26   10                         | 104                                        | 16   11                                                          | 64                         | 26   10                                                          | 104                        |  |
| 65                                                      | 25   10                         | 100                                        | 20   11                                                          | 80                         | 10   10                                                          | 40                         |  |
| 66                                                      | 24   10                         | 96                                         | 24   11                                                          | 96                         | 24   11                                                          | 96                         |  |
| 70                                                      | 20   10                         | 80                                         | 10   10                                                          | 40                         | 20   01                                                          | 80                         |  |
| 80                                                      | 10   10                         | 40                                         | 20   01                                                          | 80                         | 10   10                                                          | 40                         |  |
| 89                                                      | 1   10                          | 4                                          | 26   00                                                          | 104                        | 16   11                                                          | 64                         |  |
| 91                                                      | 29   11                         | 116                                        | 4   11                                                           | 16                         | 14   00                                                          | 56                         |  |

Figure 3.16: Operation Process for a 3-Stage Pipeline TDC.

In the table above, the residue time and the corresponding digital code in each stage have been presented. Since a 3-stage TDC contributes to a 6-bit digital

output, the resolution can be calculated by Equation (3.14).

$$T_{LSB} = \frac{N_q}{G^{n-1}} = \frac{30}{4^{3-1}} = 1.875ps,$$
 (3.16)

The final digital output of TDC from Table 3.16 proves the correctness of the results, shown in Table 3.17. The digital output only changes when the input time varies more than 2ps. Otherwise, it keeps the same.

| Input<br>Time     | 61ps   | 62ps   | 63ps   | 64ps   | 65ps   | 66ps   | 70ps   | 80ps   | 89ps   | 91ps   |
|-------------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|
| Digital<br>Output | 100000 | 100001 | 100001 | 100010 | 100010 | 100011 | 100101 | 101010 | 101111 | 110000 |

Figure 3.17: Digital Output for a 3-Stage Pipeline TDC.

# 3.4. 1.5-bit Pipeline Stage and Digital Correction

In Section 3.2, an example of a 2-bit sub-TDC is given. By cascading these low-resolution TDC stages, TDC with improved speed and accuracy can be realized. However, the 2-bit stage is sensitive to the time offset from the delay elements, which is usually caused by the parameter variation. Assuming the delay time of the delay element varies from  $1T_d$  to  $1.1T_d$ , the transfer curve between  $T_{in}$  and  $T_{res}$  is shown in Figure 3.18.



Figure 3.18: Transfer Curve with TDC offset:  $T_{in}$  verses  $T_{res}$ .

The transfer curve has been divided into four parts: 00, 01, 10, 11, based on the digital output of TDC. Since the delay time of the cells becomes larger, the range of  $T_{in}$  and  $T_{res}$  is also increased.

In Figure 3.18, the residue time, ranges from  $1.1T_d$  to 0, is sent to a followed sub-TDC. If the gain of the time amplifier is 4, and the followed sub-TDC is ideal, the transfer curve of the second stage is shown in Figure 3.19. When  $T_{in}$  is ranging from 0 to  $4T_d$ , the transfer curves in the ideal and non-ideal situation are the same. However, the residue time ranging from  $4T_d$  to  $4.4T_d$  exceeds the dynamic range of followed sub-TDC, the residue output is 0 and the corresponding digital output is 11. The related linearity problem would lead to the time increasing for the digital code 00 in the third stage, indicated in Figure 3.20.

It is evident that the non-ideal of the delay element from the previous stage would significantly influence the linearity of the followed stage. In other words, a small error in the decision level (determined by the delay element) would overload the backend TDC, and finally cause the deterioration of the TDC transfer function linearity for the 2-bit TDC stage. To compensate the offset in Figure 3.18, a 1.5-bit/stage sub-TDC is introduced and acts as the main block of the pipeline TDC.



Figure 3.19: Transfer Curve in the Third Stage:  $T_{in}$  verses  $T_{res}$ .

A TDC stage with 1.5-bit output consists of 1 bit real code and 0.5 bit of redundancy. Unlike 2-bit stage, 1.5-bit stage has three digital outputs: 00, 01 and 10. As is shown in Figure 3.21, two decision levels, located at  $T_{in} = 3T_d$  and  $T_{in} = 5T_d$ , divide the full dynamic range into three parts with unequal time range. Assuming the two time reference are  $T_{ref1}$  and  $T_{ref2}$ , the full dynamic range is  $T_{ful}$ , then the transfer function of the 1.5-bit/stage can be expressed by Equation (3.17).

$$T_{res} = \begin{cases} (T_{ref1} + 1) - T_{in}, & T_{ref1} > T_{in} >= 0\\ (T_{ref2} + 1) - T_{in}, & T_{ref2} > T_{in} >= T_{ref1}\\ T_{ful} - T_{in}, & T_{ful} > T_{in} >= T_{ref2} \end{cases}$$
(3.17)

With the help of the wider input time range, a maximum of  $1T_d$  can be tolerated



Figure 3.20: Transfer Step in the Second Stage with DC offset.



Figure 3.21: The Relation Between  $T_{in}$  and  $T_{res}$ .

without exceeding the dynamic range of the sub-TDC. In Figure 3.22, the residue varies from  $4T_d$  to 0 in the non-ideal situation. After amplification by a factor of 2 by the time amplifier, the residue ranges from  $8T_d$  to 0, which is still within the range of sub-TDC. If the residue time which corresponding to digital code '10' is sent to the followed stage, the transfer curve of this stage is indicated in Figure 3.23. Apparently, compared with the 2-bit stage, the 1.5-bit sub-TDC have better offset tolerance, thus relaxes the quantization accuracy specifications in the sub-TDCs.

Based on the analysis of the 1.5-bit/stage and the 2-bit/stage architectures, the



Figure 3.22: 1.5 bit stage Transfer Curve with offset:  $T_{in}$  verses  $T_{res}$ .



Figure 3.23: Second stage Transfer Curve with offset:  $T_{in}$  verses  $T_{res}$ .

#### 1.5-bit/stage has the advantages of:

- Higher offset tolerance due to the wide input range,
- Higher conversion rate because of the stage gain reduction from 4 to 2,
- Lower accuracy requirement because of the existence of redundancy bit.

In Figure 3.23, the redundant bit '10' is generated in the non-ideal case. It can be solved by the digital error correction [1] [2], which is realized by adding up

the properly delayed digital outputs of each stage with one-bit overlap, the MSB of stage i is added to the LSB of the previous stage i-1, as indicated in Figure 3.24.

Figure 3.24: Digital Error Correction Method.

# **3.5.** Offset in Each Block

The pipeline TDC mainly consists of three blocks: TDC, DTC and time amplifier, the most critical block of TDC and DTC is gated delay element. In this section, the offset from each block are introduced.

#### **3.5.1.** TDC Offset

The performance of a low-resolution TDC is primarily limited by the accuracy of the delay cell, and secondarily by the gating skew error, which comes from the undesired phase shift when delay cells switch back and forth from propagation to the hold model. Since the output nodes of gated delay cell are floating during the disable state, the noise shaping that originates from the continuity of the phrase is rather vulnerable to the effect of leakage of pn junctions at the output nodes and charge redistribution during the switching of the gating transistors [3]. Furthermore, the gated delay cell can not be started or stopped instantaneously by the gating signal, which is another source of gating skew error.

Since the gated delay element has a simple logic, the delay time, which is the time reference of the input time, is easily influenced by the non-ideal situations, such as the process, voltage and temperature (PVT) variation, which would affect the linearity of the TDC. Figure 3.18 and Figure 3.22 show the transfer curve in both ideal and non-ideal situation due to the delay time variation. Furthermore, as is discussed in the last section, the maximum tolerable offset for a 1.5bit/stage TDC is  $1T_d$ . If the offset is larger than this limitation, linearity degradation would be introduced in the overall TDC, which happens in the stage where overloading takes place.

#### **3.5.2.** DTC Offset

The DTC in the pipeline TDC is mainly realized by D Flip-flops (DFFs) which sample the state of the gated delay line and converts it to digital code. Since DFFs have its delay and the states from GDL can not be sampled instantaneously when the sampling signal arrives, time mismatch will occur.

#### 3.5.3. Subtractor Offset

The Subtractor compares the time difference between the original input time and the DTC output. The signal distortion within the component will increase the residue time and cause the gain error, which has the possibilities to exceed the maximum time range and introduce a missing code event, shown in Figure 3.25.

Furthermore, due to the non-ideal behavior of the component, the residue time generated by the subtractor has its minimum value instead of zero. A fixed offset is introduced to the residue time to keep the subtractor working appropriately.



Figure 3.25: 1.5bit/stage TDC Transfer Curve with Subtractor Offset.

# 3.5.4. Time Amplifier Offset

The time amplifier serves as the interconnection stage of the pipeline TDC, which receives the residue time from the previous stage, multiplies it by a gain value and sends it to the next stage. Similar to a voltage amplifier, the time amplifier also suffers from the gain errors due to the component nonideality.

The time amplifier has the bandwidth limitation, where the bandwidth is the input time range. Assuming an ideal time amplifier with the gain of A and a maximum input time residue  $T_m ax$ , then the bandwidth BW is:

$$BW = \frac{1}{A \cdot T_{max} + T_{overlap}},\tag{3.18}$$

where  $T_{overlap}$  is the overlap time between two time residues.





Figure 3.26: Time residue with overlapping.

As is shown in Figure 3.26, if the bandwidth limitation of time amplifier is exceeded, two residues will overlap with each other, thus causing the non-linearity situation in the next stage.

# 3.6. Summary

In this chapter, the operation principle of the residue time generation has been discussed. With the help of the gated delay line, the residue from the previous measurement can be transferred to the beginning of the next measurement, finally produce the residue time. Furthermore, an algorithm is introduced to analyze the noise influence in each stage, which will degrade the linearity of the followed stage. The delay time variation of the gated delay element also has been discussed. To enhance the offset tolerance, the 1.5bit stage is implemented in the pipeline TDC.

40 References

# References

[1] B. Razavi, *Principles of data conversion system design,* McGraw-Hill Publishers (1995).

- [2] D. A. Johns and K. Martin, *Analog integrated circuit design,* John Wiley and Sons Publishers (1997).
- [3] F. Yuan, Cmos time-mode circuits and systems: Fundamentals and applications, CRC Press (2015).

3

# Pipeline TDC Implementation

I have nothing to offer but blood, boil, tears and sweat.

Winston Churchill

A 1.8V, 14-bit, 200Msps pipelined TDC prototype is designed in a 0.18um CMOS technology. This chapter discusses the implementation details of the ring oscillator, first stage DTC, time amplifier, gated delay line and corresponding DTC in the rest of the stages.

The overall pipeline stage mainly consists of two blocks: the first stage with a multiplexer, a ring oscillator, a selector and coarse counter based TDC to increase the dynamic range, and subsequent stage with gated delay lines as TDC to measure the residue time.

# 4.1. First Stage

#### 4.1.1. Multiplexer

To realize the channel multiplexing function, the multiplexer combines the signal from multiple channels into the single channel. Figure 4.1 shows an example of the asynchronous signal waveform from three different channels, whose pulses occur randomly and always overlap with each other. To multiplex the signal, since the input signal is digital signal with two levels: high and low, three values are considered: Time point for the rising edge, time point for the falling edge and pulse width.



Figure 4.1: Waveform for Each Channel.

For the reason the pulse width can be determined by the time position of rising and falling edges, a multiplexer is used to encode the signal by detecting these edges in different channels and generating the corresponding pulses which are further added together to form a pulse train and becomes the trigger signal for the pipeline TDC, the corresponding time point will be digitized. During the reconstruction process, the specific time of rising and falling edges can be recovered from digital signal.

The multiplexer architecture is shown in Figure 4.2. The delay cells introduce mismatch between the two inputs of XOR gate; when the rising/falling edge of the input signal arrives, the XOR gates will generate pulses that width equal to the delay time of the delay element; these pulses are sent to a four-input OR gate and becomes the final output of the multiplexer. In this way, the signal in each channel is multiplexed, the generated pulse train is connected to the TDC to measures the time when the pulses arrive.

Since the multiplexer generates pulses by input mismatch of the XOR gates, the output pulse width is constant. If the time difference of two generated pulses is smaller than the pulse width, they will overlap with each other when added together by the OR gate. Therefore, the output pulse density, refers to the output pulse number per second, is limited by the fixed pulse width; smaller pulse width means higher density of the output pulse, which can also be referred as the bandwidth of the multiplexer. Assuming the delay cell has the delay time of  $T_{del}$ , all the



Figure 4.2: Architecture of the Multiplexer.

components are ideal, the bandwidth BW is given by:

$$BW = \frac{1}{T_{del}}. ag{4.1}$$

The output pulses would be overlapped once the bandwidth is exceeded. Therefore, the time distance between the pulses in different channels should be larger than  $T_{del}$ , thus within the bandwidth. If the requirement is met, then the multiplexer could work properly, as shown by the result in Figure 4.3.



Figure 4.3: Multiplexer Simulation Result.

#### **4.1.2.** TDC: Ring Oscillator

The pulses generated by the multiplexer act as the trigger signal of the TDC. To measure the input time and produce the residue, there are two tasks needed to be realized when the trigger signal is detected:

- 1) Store the state of the ring oscillator when trigger pulse arrives and convert it to digital code,
- 2) Select the followed output of the trigger pulse from the ring oscillator, combine it with initial trigger pulse and produce the residue.

The ring oscillator acts as the TDC in the first stage to increase the dynamic range by coupled with a coarse counter. The conventional ring oscillator uses both rising and falling edge as its output, which increases the difficulties of processing the signal. To simplify the circuit, a ring oscillator with a different output combination is introduced into this design.



Figure 4.4: Demonstration of Ring Oscillator.

Figure 4.4 shows the schematic of the ring oscillator that consists of seven inverters. Compared with the conventional structure whose output nodes follow the signal path, the ring edges are distributed uniformly with this output combination. The resolution is determined by the delay time of two inverters.

In Figure 4.5, the waveform of each output shows the characteristic of the ring oscillator. Assuming the delay time of the inverter is  $T_{inv}$ , the frequency of the ring oscillator  $f_{ro}$  is defined by:

$$f_{ro} = \frac{1}{N \cdot 2T_{inv}},\tag{4.2}$$

where N is the number of the output.

From Equation (4.2),  $T_{inv}$  determines the frequency of the ring oscillator as well as the output residue range of the first stage; any variation will influence the performance of the whole system. To deal with this problem, besides using the 1.5bit/stage TDC as the followed sub-TDC to enhance the offset tolerance; gated delay cell, which has been used in sub-TDC to form the gated delay line, is also implemented in the ring oscillator. By using the same types of the delay cell in every pipeline stage, offset from the delay cell can be compensated by the followed stage, thus reducing the influence of the delay time variation to the whole system. The schematic is indicated in Figure 4.6, where the gated delay cell is always enabled.

45

4.1. First Stage





Figure 4.5: Waveform of Ring Oscillator.



Figure 4.6: Schematic of the Ring Oscillator.

Assuming an offset  $T_{err}$  is introduced in each gated delay cell of the ring oscillator due to the influence of PVT variation, the rest of the components are ideal. The maximum time residue  $T_{res1}$  of the first stage is:

$$T_{res1} = 2(T_{inv} + T_{err}),$$
 (4.3)

Since the gated delay line in the followed TDC stages uses the same gated delay cell, the same offset is introduced, and the residue time for the second stage is given by:

$$T_{res2} = 2(T_{inv} + T_{err}) - T_{res1} = 0,$$
 (4.4)

The offset is canceled in the second stage from Equation (4.4), therefore has no influence on the linearity of the whole system.

#### 4.1.3. DTC: Selector

Once the pulses from the multiplexer arrive, a selector is needed to select an output from the ring oscillator followed by the trigger signal, then produce the residue by the subtractor. Figure 4.7 shows the core component of the selector. The output of ring oscillator and the trigger signal from multiplexer are connected to the clk and CLR of DFF respectively, which are named as In and Hit; D is always at the high level.



Figure 4.7: core component of the selector.

DFF is triggered by In when the Hit is high (DFF is enabled), whose output will change from low to high, then back to low when the Hit is low. Figure 4.8 shown the waveform of each signal. The rising edge of the out signal aligns with the rising edge of In1 while the falling edge comes from the falling edge of the Hit signal.



Figure 4.8: Waveform of Each Signal.

The function of the DFF is similar to the role of the XNOR gate which output is high only if inputs are the same. However, the second pulse In2 from the signal In is "filtered" by the Hit signal because DFF is disabled, only the rising edge of the In1 becomes the rising edge of the output signal. If the output of the ring oscillator in Figure 4.5 which has endless pulses is connected to the clk of DFF, its rising edges can transfer to the output only if the Hit signal is high. In this case, the output signal followed by the Hit signal is selected, and the time difference of Hit and Out in Figure 4.7 is the residue time. Figure 4.9 shows the array of selectors, the corresponding signal waveform is shown in Figure 4.5. The select array has multiple outputs, and the output pulse with the earliest rising edge is selected.

47



Figure 4.9: Selector Array.

The functionality and the waveform of the DFF discussed and shown above are both in the ideal situation. In practical, DFF has its own delay time, which will add a fixed offset to the residue. Furthermore, instead of being zero, the time difference between the Hit signal and the In signal has its minimum value, In can only trigger DFF after Hit is applied. Figure 4.10 indicates the relationship between  $T_{In} - T_{Hit}$  and the DFF delay time.



Figure 4.10: DFF Performance.

DFF is triggered when the  $T_{In}-T_{Hit}$  is larger than 140ps; the delay time becomes stable when it reaches 300ps. To keep the DFF operates with a constant delay time, a fixed offset is needed to be introduced between  $T_{In}$  and  $T_{Hit}$ . Therefore, instead of selecting the pulse with the earliest rising edge from selector array in Figure 4.5, The time offset in introduced by choosing the second most initial rising edge from

the selector output. Hence, the residue has the offset of  $2T_d$ . The logic for the second earliest edge selection is shown in Figure 4.11. The array of AND gates combines each input with its neighbour to wipe out the earliest pulse; the array of OR gate wipes out all other pulses except the second earliest one. Finally, only the second most initial pulse remains.



Figure 4.11: Schematic for Second Earliest Edge Selection.

As discussed above, all the rising edges of the ring oscillator during the high level of the Hit signal is transferred to the output of the selector array. The second earliest output pulse is selected, combined with the proper delayed trigger signal from the multiplexer, finally became the residue signal, and amplified by the time amplifier which is discussed in Section 4.1.4. Meanwhile, the output pulses of the selector array are also sent to the encoder to get the digital output code, which architecture is covered in Section 4.1.5.

### **4.1.4.** Time Amplifier

The time amplifier is one of the main contributor to the pipeline TDC functionality, the noise and offset from this block would affect the overall performance dramatically. Therefore, the accuracy of the time amplifier is the primary consideration for the circuit implementation. In this design, the pulse-train architecture is used. After careful design, the targets of a good accuracy as well as broad bandwidth are achieved.

The conventional pulse-train time amplifier is shown in Figure 4.13. The input pulse is delayed by the delay cells with the different amount in each branch, then added up to a pulse train by an OR gate, the width of each pulse is the same as the input pulse width. With the help of the pulse train output, this time amplifier has the advantages of linear, accurate and has programmable gain for an flexible input time range [1].

It should be noted that: the input time range of the pulse-train time amplifier is determined by the delay time of the delay cells. Assuming the delay time is  $T_{del}$ ,



Figure 4.12: Selector Waveform.



Figure 4.13: Conventional Pulse-Train Time Amplifier.

branch (K) has N more delay cells than with branch (K-1), then the input time range TR is:

$$TR = T_{del} \cdot N, \tag{4.5}$$

The TR in Equation (4.5) indicates the maximum input time length the amplifier can accept to keep the gain stable. For instance, assuming the gain of time amplifier is G, when the input time is smaller than TR, the gain is the same; however, if the TR is exceeded, the pulses within the pulse train start overlapping and the gain of the time amplifier becomes smaller than G. Assuming the overlap time is  $T_{ov}$ , Equation (4.6) shows the relation between gain G and input time  $T_{in}$ .

$$G = \begin{cases} \frac{G \cdot T_{in}}{T_{in}}, & T_{in} = < TR \\ \frac{G \cdot TR + T_{ov}}{TR + T_{ov}}, & T_{in} > TR \end{cases}$$

$$\tag{4.6}$$

Where

$$T_{in} = TR + T_{ov}, \quad T_{in} > TR \tag{4.7}$$

The gain of the time amplifier G keeps the same when  $T_{in}$  is smaller than TR. If  $T_{in}$  is larger than TR and goes infinity,  $T_{ov}$  becomes infinity and the gain of the time amplifier is close to 1. The G-TR relationship is similar to the voltage gain and the input frequency relation of a voltage amplifier. In this case, the TR can be seen as the bandwidth of the time amplifier.

Assuming the gain of the time amplifier is 4, bandwidth is 5ns, if the gain is represented in dB, where  $G_{dB} = 10 \cdot log_{10}G$ , the relationship between the gain and the input time is shown in Figure 4.14. When the input time length is infinity,  $G_{dB}$  approaches 0dB.

The minimum input time of the time amplifier is 0 in the ideal situation. In practical, however, delay cells have distortion due to the limitation of the rise time and fall time, a time offset is subtracted when an ideal pulse goes through a delay cell, similar to the pulse-shrinking TDC introduced in Chapter 2. Assuming the time offset is  $T_{off}$  and all the delay cells are identical, the widths of the four pulses at the input of the OR gate in Figure 4.13 are:

$$\begin{cases} T_{in} = T_{in}; \\ T_{in1} = T_{in} - T_{off}; \\ T_{in2} = T_{in} - 2 \cdot T_{off}; \\ T_{in3} = T_{in} - 3 \cdot T_{off}, \end{cases}$$
(4.8)

The input time should be larger than  $3 \cdot T_{off}$  to overcome the delay cell offset. Otherwise, no pulse would appear at the output. The bandwidth of the time amplifier can be extended by increasing the delay cell number based on Equation (4.5). However, in Equation (4.8), the minimum input time is also increased due to



Figure 4.14: Bandwidth of the Pulse-Tain Time amplifier.

the delay cell offset, which is proportional to the number of the delay units. Meanwhile, the pulse train with unequal pulses width would increase the offset at the next stage. To solve the problem, the schematic of the DFF based pulse-train time amplifier is shown in Figure 4.15. The sequence of operation is:



Figure 4.15: DFF Based Pulse-Train Time Amplifier.

• Detect the ring edges of *Hit* signal and *In* signal, produce rising edge at the

output of DFF1 and DFF3;

- Generate time residue one by inverters (inv1) and NAND gate (NAND1);
- The outputs of *DFF1* and *DFF3* in 1) propagate through the delay cells *del1* and *del2*, then send to the input of *DFF2* and *DFF4*;
- Generate time residue one by inverters (inv2) and NAND gate (NAND2);
- Time residue from step 2 and 4 are sent to the AND gate to form a pulse train;
- The output in step 5 resets DFF through inverter *inv*3.

The bandwidth are determined by the delay cells as well as the DFFs; the gain equals to the number of the pulses. Unlike the conventional architecture which first generates the residue pulse and then produces the pulse train, shown in Figure 4.13, Hit and In signals first propagate separately, then generate the residue pulse. In this way, the offset from the delay cell as well as the DFF is canceled due to the balance of the propagation line, the minimum time difference between two input signals is decreased. Additionally, since the rising edge generated by four DFFs have the same rise time, the signal deterioration caused by the non-ideality of the blocks would have less influence on the residue pulse. Therefore, compared to the conventional architecture, the offset of the residue pulse is reduced significantly.

#### **4.1.5.** Thermometer-to-Binary Encoder

Thermometer-to-Binary Encoder in the first stage is used to convert the state of the selector to the binary code. In principle, the encoder goes through three steps:

- Store the state of the selector;
- Convert the stored code to 1-hot code;
- Convert 1-hot code to binary code.

In the digital circuits, the one-hot code is a group of bits among which the legal combinations of values are only those with a single high (0) bit and all the others low (0) [2]. The comparison of register output, 1-hot code and the Binary code is shown in Figure 4.16. The selector outputs in Figure 4.12 are first stored in a register which is triggered by the Hit signal from the selector.

The schematic in Figure 4.17 converts the register output to the 1-hot code. It mainly focuses on the boundary of 0 and 1 from the register output; the code combination '01' is converted to 1 while the other keeps 0.

The last step is to encode the 1-hot code to binary code; the encoder schematic is shown in Figure 4.18. By using 8 OR gates, the output of the selector is finally transferred to binary code.

It should be noted that, since the ring oscillator has only seven outputs, the corresponding binary code is from 001 to 111, and the code 000 is missing. Therefore, the binary output of the ring oscillator is 2.5bit instead of 3bit.

| Register Output | 1-Hot Code | Binary Code |
|-----------------|------------|-------------|
| 1110000         | 1000000    | 001         |
| 0111000         | 0100000    | 010         |
| 0011100         | 0010000    | 011         |
| 0001110         | 0001000    | 100         |
| 0000111         | 0000100    | 101         |
| 1000011         | 0000010    | 110         |
| 1100001         | 0000001    | 111         |

Figure 4.16: Comparison of Three Types of Code.



Figure 4.17: 1-Hot Converter.

# 4.2. Subsequent Stages

The first stage of the pipeline TDC mainly focuses on processing the signal from the ring oscillator and generating the residue time. Both TDC and DTC are used to realize the function. For the rest of the stage, the 1.5bit/stage TDC introduced in Section 3.4 is used, followed by a DTC per stage. The time amplifier is same as the one employed in the first stage, which is introduced in Section 4.1.4.

The gated delay line can receive the pulse train coming from the first stage by enabling and disable the delay cells. Furthermore, as is discussed in Section 3.4, the 1.5bit/stage TDC has better offset tolerance due to its large input time range. In this case, the 1.5bit gated delay line is chosen as the TDC for the subsequent stages.

The schematic of the TDC is shown in Figure 4.19. The gated delay line serves as the TDC; DFFs store its state. The digital outputs D1, D2 and D3 control the switches S1, S2 and S3 respectively.



Figure 4.18: Binary Encoder.



Figure 4.19: Schematic of Gated Delay Line.

 $T_{in}$  first goes through an OR gate and begins to control the propagation of the gated delay line. The Trigger signal becomes high when the  $T_{in}$  from the previous stage arrives. After  $T_{in}$  ends, the Trigger signal would be held where  $T_{in}$  stopped; the activated components during the operation mentioned above are marked in blue, which is shown in Figure 4.20.

The Hit signal triggers the DFFs and store the states of the gated delay line. Since D1, D2 and D3 have only three combinations: 100, 110 and 111, it can be simply converted to the 1-hot code by the logic introduced in Section 4.1.5, which result is 100, 010 and 001 respectively. From these three bit codes, the last two bits are 00, 01 and 10 are the binary output of the current stage, in this case, the block of thermometer-to-binary encoder is not necessary.

Furthermore, the 1-hot digital output could control the opening and closing of the switches. In Figure 4.21, the DFF1 and DFF2 are activated, the corresponding 1-hot digital output is 010, therefore, switch S2 is closed.

After that,  $T_{en}$  would enable the gated delay line again after  $T_{in}$ . The trigger

4.3. Summary 55



Figure 4.20: Schematic of Gated Delay Line: Signal Propagating Through the GDL.

signal is sent to the OR gate when it propagates through switch S2. It should be noted that the signal needed to generate the residue is the rising edge of the trigger signal, so the non-ideal behavior of the OR gates to the pulse width of the trigger signal has no influence on the time accuracy.



Figure 4.21: Schematic of Gated Delay Line and Selector: Select the Signal.

Hit and  $T_{sel}$  signals are sent to the time amplifier of Figure 4.15, the time difference between these two signals is the time residue for this stage. As is discussed before, the position of the DFFs and switches is not equally distributed in gated delay line to enhance the offset tolerance of the TDC. Furthermore, a delay cell is added at the beginning of the gated delay line to cancel the offset from the previous stage.

# 4.3. Summary

The architectures of the first and remaining stages have been discussed in Section.4.1 and Section 4.2. The system overview is shown in Figure 4.22 and Figure 4.23.

The multiplexer converts the rising and falling edges in each channel to pulses and becomes the trigger signal of the selector which stores the current state of each output of the ring oscillator, and sends to the encoder to generate the binary output. One signal is chosen as the selected signal *Sel* from the selector, combined



Figure 4.22: System Level: First Stage.

with the proper delayed *Hit* and sent to the time amplifier which generates the pulse train of the residue signal for the next stage.

In the first stage, a ring oscillator acts as the TDC to extend the full-scale range. However, the characteristics of continuous running and complicated output signal demand several additional blocks to track and process the output signal. Therefore, the first stage is more complicated than the following stage.



Figure 4.23: System Level: Stage N.

In the second stage, a simple architecture can realize all the needed functions in the first stage such as generating the residue signal and the binary output. The Hit signal from the previous stage will store the state of the gated delay line after the residue time transfer ends. The enable signal enables the gated delay line again; the sel signal is also generated while the enable signal is '1'. Finally, similar to the first stage, the Sel and Hit signals are sent to the time amplifier which generates the residue time for next stage.

The schematic of the pipeline TDC is shown in Figure 4.24, the whole architecture consists of six stages. The first stage (master stage) contributes the majority of bits, followed by five stages (slave stage) to increase the resolution of the master

4.3. Summary 57

stage, with 1.5bit in each stage. The combined digital output is processed by the digital error correction logic to generate the correct binary code.



Figure 4.24: System Level: Pipeline TDC.

As all of the circuit blocks have been introduced, in the next chapter, the dynamic performance of the entire TDC would be given.

4

58 References

# References

[1] K. Kim, Y.-H. Kim, W. Yu, and S. Cho, *A 7 bit, 3.75 ps resolution two-step time-to-digital converter in 65 nm cmos using pulse-train time amplifier,* IEEE Journal of Solid-State Circuits **48** (April 2013).

[2] H. David and H. Sarah., *Digital design and computer architecture (2nd ed.).* San Francisco, Calif.: Morgan Kaufmann. **p. 129.** 

4

# 5

# TDC's Simulation Result

Faith will move mountains.

Anonym

In this chapter, the performance and robustness of the TDC implementation are examined. The simulated performance is related to the specification and compared to state-of-the-art ADCs. An estimation of the power distribution is also performed.

#### **5.1.** Simulation Result

# 5.1.1. Gated Delay Element Mismatch

The size of the gated delay element identifies the driving capability and is also determined by the scale of the input capacitance. The trade-off between propagation delay and power consumption needs to be considered. The delay element mismatch can be obtained by running a Monte Carlo simulation within the Cadence environment.

In order to test the mismatch of the gated delay element by the influences of process variation and interconnection mismatch, the dummy gated delay elements are added before and after the measured delay element; the supply voltage  $V_{dd}$  is 1.8V. As depicted in Figure 5.1, it produces a propagation delay of 191.59ps and around 1.53ps of standard deviation with 500 samples. The delay mismatch can be reduced by increasing the transistor size. However, the power consumption of the gated delay element is the limitation.



Figure 5.1: Monte Carlo Simulation for Delay Element Mismatch.

### **5.1.2.** Time Amplifier Mismatch

Since the time amplifier is non-ideal, it has the internal mismatch refers to the its gain in different conditions. This is caused by two non-idealities. First, it is due to imperfect replicas generated by the delay lines. This duplication error is caused by rise or fall time mismatch of the inverters in the delay cells. Another source of error is the AND gate which merges the replicas and produces the train of pulses. In the AND gate, mismatch of among NMOSs and PMOSs changes the transition point of the pulses, resulting in pulse-width error. For example, an NMOS with low  $V_{th}$  is turned on early and turned off late which results in wider pulse-width compared

to the case when is normal. Therefore, even if the AND gate receives N perfect replica pulses, error is created and thus the gain is not exactly N.

A detailed view of accuracy and linearity of the proposed time amplifier is shown in Figure 5.2, the mismatch of the gain is obtained by keeping the input time interval the same and measuring the output time interval. After running the Monte Carlo simulation with 218 samples, it shows that the gain of the time amplifer is 1.99956 and around 0.077 of standard deviation.



Figure 5.2: Monte Carlo Simulation for Gain Mismatch of Time Amplifier.

The relation between the gain offset and the input time interval is shown in Figure 5.3. The gain offset reaches the maximum of 2.2 percentage at the input time interval of 400ps, which is still within the maximum offset tolerance (191.59ps) of the sub-TDC.

### **5.2.** TDC Performance

#### **5.2.1.** Resolution

Figure 5.4 shows the resolution of TDC at 1.8 V of power supply, but at different corners and temperatures. The designed TDC is intended to operate between  $-40^{\circ}$  C to  $+80^{\circ}$  C. At a typical corner and at room temperature, the TDC resolution is around 6.44 ps while under the worst case condition it is around 8.28 ps. Furthermore, the TDC resolution is reduced by increasing the power supply. The simulation results are shown Figure 5.5.



Figure 5.3: Gain Offset vs. Input Time Interval.



Figure 5.4: Average TDC delay resolution vs. operating temperature.

# **5.2.2.** Linearity

The linearity of the TDC has been measured in terms of differential-non-linearity (DNL) and integral-non-linearity (INL). The measured INL and the DNL were ob-



Figure 5.5: The TDC Resolution vs. Supply Voltage.

tained from the circuit simulation of Cadence. This simulation took a long time in transient analysis by introducing a small mismatch between the ring oscillator frequency and the input pulse frequency. Figure 5.6 shows the results of both DNL and INL with the values below 1.5 least significant bit (LSB). Furthermore, the calculated linearity performance at fin=185 MHz is  $INL_{max} = 1.5LSB$ , equivalent to ENOB = 12.67 based on Equation 2.6.

### **5.2.3.** Accuracy

The accuracy of the TDC has been measured by keeping input time the same while introducing mismatch into each sub-TDC blocks. By simulating the circuit in Monte-Carlo for multiple times and comparing the digital output, the accuracy of the TDC can be evaluated.

Figure 5.7 indicates the digital output distribution which input time is 300ps, the digital output concentrate on 46, the appearance of digital code of 45, 47 and 48 primarily attribute to the mismatch of the time amplifier, and variation of the gated delay element in some aspects.

### **5.2.4.** Power Consumption

The power consumption of the design was estimated in normal operation with 200MHz input frequency, the power consumed in themaster stage (first stage) and the slave stage (followed stage) in Table 5.1.

Since the slave stage 5 (last sub-TDC stage) is only a flash TDC without time amplifier, it consumes less power compared with other slave stages.



Figure 5.6: Simulated INL and DNL at 185 MHz.



Figure 5.7: Simulated INL and DNL at 185 MHz.

The relation between the power consumption and the input pulse frequency (the number of pulses sent to the TDC per second) is presented in Table 5.2. The power consumption increases when the input pulse density rises.

|           | Master | Slave 1-4 | Slave 5 | Other Logic | Total |
|-----------|--------|-----------|---------|-------------|-------|
| Power[mW] | 5.415  | 1.375     | 1.185   | 1.80        | 13.90 |

Table 5.1: Power Distribution in Each Block

| Frequency[MHz] | 20    | 50    | 100   | 150   | 200   |
|----------------|-------|-------|-------|-------|-------|
| Power[mW]      | 1.932 | 2.987 | 4.228 | 9.374 | 13.90 |

Table 5.2: The Relation Between Power Dissipation and Input Frequency

#### **5.2.5.** Robustness

In the previous sections, the performance has been verified at the operating point the TDC was designed for the real data converters need to be robust, meaning that they are able to withstand some variations in the operating conditions. This section describes this additional testing of the design.

#### Temperature

The effect of extreme temperatures on the pipeline TDC performance was investigated. The simulated INL for  $-40^{\circ}C$ , the room temperature and  $80^{\circ}C$  are listed in Table 5.3.

| Temperature[°C] | DNL[LSB] | INL[LSB] |
|-----------------|----------|----------|
| -40             | 0.72     | 1.2      |
| 27              | 0.85     | 1.5      |
| 80              | 0.73     | 2.4      |

Table 5.3: The Effects of Temperature Variations on Performance.

The linearity performance is degraded as the temperature increases, partly because of the increase of the offset in the gated delay line. We can also expect the resolution becomes lower as the temperature increases

#### Supply Voltage

The sensitivity to variations in supply voltage was also investigated, see Table 5.4. As the conductance of transistors depends on the voltage driving them the performance is degraded as  $V_{dd}$  is reduced. Similarly, the performance is improved for an increased supply voltage.

| Supply Voltage[V] | DNL[LSB] | INL[LSB] |
|-------------------|----------|----------|
| 1.7               | 0.58     | 2.2      |
| 1.8               | 0.85     | 1.5      |
| 1.9               | 0.92     | 1.4      |

Table 5.4: Effects of Supply Voltage Variations on Performance.

#### **Process Corners**

Circuits implementing by the MOS transistors may run faster or slower depending on variations in the fabrication process. The process corners represent the extremes of these variations. A commonly used convention for naming process corners is to denote the mobility of NMOS and PMOS transistors respectively. The mobility of these two transistors can be either higher (fast corner) or lower (slow corner) than the nominal case (typical corner). The results of simulating five corners can be seen in Table 5.5 representing the scenarios where the NMOS and PMOS are fast or slow.

| Process Corners | DNL[LSB] | INL[LSB] |
|-----------------|----------|----------|
| fast-fast       | 0.61     | 2.2      |
| fast-slow       | 0.81     | 2.1      |
| typical         | 0.85     | 1.5      |
| slow-fast       | 1.3      | 2.6      |
| slow-slow       | 0.61     | 2.2      |

Table 5.5: Effects of Process Corner on Performance.

# 6

# Conclusion

All things come to those who wait.

Anonym

In this chapter, the conclusion of the TDC achitecture is given. An estimation of the future work is also performed.

68 6. Conclusion

#### **6.1.** Conclusion

In the presented work a 14-bit Pipeline TDC for the transducer array channel multiplexing for ultrasound imaging system has been successfully designed. The performance satisfies the given requirements. The concept of the proposed time register technique has been proved.

The new pipeline TDC architecture introduced by this paper has two main functions: time subtraction and time amplification, whiwch are realized by specific circuits. The resolution of 6.44ps is achieved by five 1.5-bit sub-TDCs, with no sacrifice towards the convention rate (200MS/s).

As a pipeline TDC, the first TDC stage provides the most of the bits, and are more complex than other sub-TDC. Since the offset from the first stage would overload the followed stages, the 1.5bit sub-TDC, which has 0.5bit of redundancy, has been introduced to enhance the offset tolerance.

In the middle between different TDC stages, the time amplifier amplifies the time residue time from the pervious TDC stage and sends to the next stage. The conventional pulse-train time amplifier has the drawback of low accuracy due to the signal distortion when propagating through the the delay element. In this design, the DFFs, together with delay elements, are used as the delay line to regenerate the input signal and keep the accuracy of the residue output.

The power efficiency of the TDC is comparable with the state-of-the-art. Moreover, for a given sampling frequency, the result shows the standout FOM (0.01), at least 40 times better than its counterparts.

# **6.2.** Comparison with state-of-the-art TDCs

To evaluate the performance of the design compares to state-of-the-art TDCs, this work was simulated with an input signal at the frequency of 200 MHz. The resulting INL equals 1.5LSB and the total power consumption was 13.90 mW. FOMS was calculated to 0.01pJ/step.

|                 | VLSI '07 [1] | CICC '09 [2] | JSSC '12 [3] | JSSC '13 [4] | This Work |
|-----------------|--------------|--------------|--------------|--------------|-----------|
| Technology      | 90nm         | 65nm         | 130nm        | 65nm         | 180nm     |
| Resolution      | 1.25ps       | 4.8ps        | 1.25ps       | 3.75ps       | 6.44ps    |
| Bits            | 9            | 7            | 8            | 7            | 14        |
| Conversion Rate | 10MS/s       | 50MS/s       | 50MS/s       | 200MS/s      | 200MS/s   |
| DNL             | 0.8LSB       | 1LSB         | 0.7LSB       | 0.9LSB       | 0.85LSB   |
| INL             | 3LSB         | 3.3LSB       | 3LSB         | 2.3LSB       | 1.5LSB    |
| ENOB            | 7.00         | 4.9          | 6.00         | 5.28         | 12.68     |
| Power           | 3mW          | 1.7mW        | 4.3mW        | 3.6mW        | 13.90mW   |
| FOM             | 2.344        | 1.139        | 1.344        | 0.463        | 0.01      |

Table 6.1: Comparison with state-of-the art TDCs.

The simulated performance of the designed TDC is summarized and compared to other similar TDCs in Table 6.1. It should be noted that the converters used for

6.3. Future Work

comparison present measured data while this work present simulated data. This design shows the standout FOM in this category at the cost of relatively higher power consumption; the high conversion rate and high resolution were both realized with the help of the pipeline architecture.

#### **6.3.** Future Work

With the development of speed and portable of the products, high speed and low power TDCs are indispensable; low supply voltage and power are the directions of design. There are mainly three aspects which can be further improved.

The ring oscillator in first stage can be well designed by implementing the control block such as DLL, to keep the output frequency and time interval stable, which can improve the linearity of the TDC significantly. Meanwhile, the conversion rate the mainly limited by the first stage, and the 1.5bit sub-TDC can reach the conversion rate at approximately 500MS/s, this design has large space in improving its conversion rate.

The power disspation of this work can also be reduced. Since the sub-TDC has large offset tolerance, the supply voltage can be reduced to save the power in this design. Better state-of-art technology will also lead to the better performance of this work.

6

70 References

#### References

[1] M. Lee and A. A. Abidi., A 9b, 1.25 ps resolution coarse-fine time-todigital converter in 90 nm cmos that amplifies a time residue. IEEE Int. Symp. VLSI Circuits Dig., 168.

- [2] A. Liscidini, L. Vercesi, and R. Castello, *Time to digital converter based on a 2-dimensions vernier architecture.* Proc. IEEE CICC. , 45.
- [3] Y. H. Seo, J. S. Kim, H. J. Park, , and J. Y. Sim, A 0.63 ps resolution, 11b pipeline tdc in 0.13 umcmos. IEEE Int. Symp. VLSI Circuits Dig. , 152 (2011).
- [4] K. Kim, Y.-H. Kim, W. Yu, and S. Cho, *A 7 bit, 3.75 ps resolution two-step time-to-digital converter in 65 nm cmos using pulse-train time amplifier.* IEEE JOURNAL OF SOLID-STATE CIRCUITS. **48**, 1009 (2013).

6