M.Sc. Thesis

Design of a fully digital analog SiPM with sub-50ps time conversion

Andrada Alexandra Muntean

Abstract

The thesis presents a design that implements digitization of an analog SiPM’s fast output on chip realized in order to minimize the complexity and increase the photon detection granularity. The design comprises a time-to-digital converter (TDC) in 0.35µm CMOS technology. The TDC is a multi-path gated ring oscillator with a 6 bit counter for coarse bits and 9 phase detectors for the fine bits. Schematic and post-layout simulations indicated a 65ps LSB in the typical corner with a DNL of ±0.55LSB and an INL of ±1LSB. The TDC design does not comprise any additional calibration circuitry.
Design of a fully digital analog SiPM with sub-50ps time conversion

THESIS

submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE

in

ELECTRICAL ENGINEERING

by

Andrada Alexandra Muntean
born in Timișoara, Romania

This work was performed in:

Advanced Quantum Architecture Laboratory Group
Department of Microelectronics
Faculty of Electrical Engineering, Mathematics and Computer Science
Delft University of Technology
The undersigned hereby certify that they have read and recommend to the Faculty of Electrical Engineering, Mathematics and Computer Science for acceptance a thesis entitled “Design of a fully digital analog SiPM with sub-50ps time conversion” by Andrada Alexandra Muntean in partial fulfillment of the requirements for the degree of Master of Science.

Dated: 21.11.2017

Chairman: prof.dr.ir. Edoardo Charbon

Advisor: prof.dr.ir. Edoardo Charbon

Committee Members: prof. dr. ir. Edoardo Charbon

prof. dr. ir. Ryoichi Ishihara

dr. ir. Carl Jackson

dr. ir. Dennis Schaart
The thesis presents a design that implements digitization of an analog SiPM’s fast output on chip realized in order to minimize the complexity and increase the photon detection granularity. The design comprises a time-to-digital converter (TDC) in 0.35\( \mu \text{m} \) CMOS technology. The TDC is a multi-path gated ring oscillator with a 6 bit counter for coarse bits and 9 phase detectors for the fine bits. Schematic and post-layout simulations indicated a 65ps LSB in the typical corner with a DNL of \( \pm 0.55 \text{LSB} \) and an INL of \( \pm 1 \text{LSB} \). The TDC design does not comprise any additional calibration circuitry.
The realization of this project would not have been possible without the advice and support of many people.

First of all, I would like to thank my thesis advisor prof.dr.ir. Edoardo Charbon for his assistance and guidance during this entire project, helping me to develop my knowledge and experience in this area, as well as offering me the opportunity to work in a very professional environment with great people. He expressed his support and patience as many times as I needed to discuss and ask questions.

I would also want to thank Esteban Venialgo who helped me so many times. He also did not hesitate to offer advice from his experience in this field which also contributed to the realization of this project.

I also thank SensL for giving me the opportunity to contribute to this project which represents a great development of my knowledge in this field.

I greatly appreciate the support of my family and all of my friends.

Andrada Alexandra Muntean
Delft, The Netherlands
21.11.2017
## Contents

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Abstract</td>
<td>v</td>
</tr>
<tr>
<td>Acknowledgments</td>
<td>vii</td>
</tr>
<tr>
<td>1 Introduction</td>
<td>1</td>
</tr>
<tr>
<td>1.1 PET Systems</td>
<td>1</td>
</tr>
<tr>
<td>1.1.1 Operating principle</td>
<td>1</td>
</tr>
<tr>
<td>1.1.2 PET detectors</td>
<td>2</td>
</tr>
<tr>
<td>1.1.3 Multichannel Digital Silicon Photomultipliers</td>
<td>5</td>
</tr>
<tr>
<td>1.2 Detectors in LiDAR applications</td>
<td>6</td>
</tr>
<tr>
<td>1.3 Project requirements</td>
<td>7</td>
</tr>
<tr>
<td>1.4 Contributions</td>
<td>7</td>
</tr>
<tr>
<td>1.5 Thesis outline</td>
<td>8</td>
</tr>
<tr>
<td>2 Time-to-digital converters</td>
<td>9</td>
</tr>
<tr>
<td>2.1 Vernier Line</td>
<td>9</td>
</tr>
<tr>
<td>2.1.1 Design</td>
<td>12</td>
</tr>
<tr>
<td>2.1.2 Simulation results</td>
<td>14</td>
</tr>
<tr>
<td>2.2 Multi-path gated ring oscillator</td>
<td>15</td>
</tr>
<tr>
<td>2.2.1 General concept</td>
<td>17</td>
</tr>
<tr>
<td>2.2.2 Advantages and drawbacks</td>
<td>18</td>
</tr>
<tr>
<td>3 Multi-path gated ring oscillator TDC</td>
<td>21</td>
</tr>
<tr>
<td>3.1 Schematic level TDC characterization</td>
<td>21</td>
</tr>
<tr>
<td>3.1.1 Methods of phase noise reduction in ring oscillators</td>
<td>21</td>
</tr>
<tr>
<td>3.1.2 Delay stage characterization</td>
<td>29</td>
</tr>
<tr>
<td>3.2 Layout and post-layout TDC characterization</td>
<td>32</td>
</tr>
<tr>
<td>3.3 Anti-phased TDC</td>
<td>36</td>
</tr>
<tr>
<td>4 Decoupling and power supply noise suppression</td>
<td>41</td>
</tr>
<tr>
<td>5 Conclusion</td>
<td>47</td>
</tr>
<tr>
<td>5.1 Future work</td>
<td>47</td>
</tr>
</tbody>
</table>
## List of Figures

1.1 Annihilation process in PET scanners. ................. 2
1.2 PMT. ............................................. 3
1.3 SiPM structure with fast output. ....................... 4
1.4 Analog SiPMs ..................................... 4
1.5 Digital SiPMs ..................................... 5
1.6 MD-SiPM ......................................... 6

2.1 Architecture concept. .................................. 9
2.2 Each SPAD with a TDC. ................................ 10
2.3 Each column of SPADs with a TDC. ..................... 10
2.4 One TDC for all SPADs. ................................ 11
2.5 Vernier delay-line based TDC [1]. ..................... 11
2.6 Operating principle of Vernier delay-line based TDC [1]. 12
2.7 TDC Time Diagram. .................................. 13
2.8 Vernier TDC block diagram. ............................ 13
2.9 Oscillation period variations with power supply voltage in different corners. 14
2.10 Operating principle of gated ring oscillator .......... 15
2.11 Gated ring oscillator TDC concept ..................... 16
2.12 Conventional ring oscillator versus multi-path ring oscillator 17
2.13 MGRO Architecture. ................................ 18
2.14 TDC - phase recycling. ................................ 18
2.15 MGRO phases - detailed diagram. ..................... 19
2.16 TDC timing diagram. ................................ 20

3.1 Temperature effect on ring oscillation period. .......... 22
3.2 Oscillation period variations due to power supply changes. 22
3.3 DNL and INL in typical-typical. ....................... 23
3.4 Mutually coupled oscillators ............................ 24
3.5 Phase noise shift of injection locking in the ring oscillator 25
3.6 Concept of injection locking in MGRO. ................ 25
3.7 Injection locking at various frequencies. ............... 26
3.8 Injection locking through a single transistor. ........ 26
3.9 Phase noise after injection at the third and fourth sub-harmonics. 27
3.10 Coupling of MGROs. ................................ 27
3.11 Coupling effect with white noise superimposed on the power supply. 28
3.12 Phase noise for three coupled MGROs. ............... 28
3.13 Tri-state inverter output. ............................. 29
3.14 Rise and fall times vs. skews. ......................... 30
3.15 Parallel-plate capacitance model. ..................... 30
3.16 Distributed RC model. ................................ 31
3.17 Stage delay vs. Interconnect length. .................. 32
3.18 MGRO layout - first variant. ......................... 33
<table>
<thead>
<tr>
<th>Section</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.19</td>
<td>MGRO layout - second variant.</td>
<td>34</td>
</tr>
<tr>
<td>3.20</td>
<td>DNL and INL simulations in all corners.</td>
<td>35</td>
</tr>
<tr>
<td>3.21</td>
<td>LSB vs. ring power supply.</td>
<td>36</td>
</tr>
<tr>
<td>3.22</td>
<td>LSB shift with temperature changes.</td>
<td>36</td>
</tr>
<tr>
<td>3.23</td>
<td>Anti-phased TDC block diagram.</td>
<td>37</td>
</tr>
<tr>
<td>3.24</td>
<td>Starved cell with four controllable bias voltages.</td>
<td>38</td>
</tr>
<tr>
<td>3.25</td>
<td>Anti-phase TDC timing diagram.</td>
<td>38</td>
</tr>
<tr>
<td>4.1</td>
<td>Non ideal power delivery system</td>
<td>41</td>
</tr>
<tr>
<td>4.2</td>
<td>pMOS decap</td>
<td>42</td>
</tr>
<tr>
<td>4.3</td>
<td>TDC power distribution network</td>
<td>43</td>
</tr>
<tr>
<td>4.4</td>
<td>Circuit impedance for different numbers of on-chip decoupling capacitors.</td>
<td>44</td>
</tr>
<tr>
<td>4.5</td>
<td>Circuit impedance for different number of on-chip decoupling caps for the power supply of the MGRO.</td>
<td>45</td>
</tr>
<tr>
<td>4.6</td>
<td>Power spectral density of the supply net for the circuit without the ring oscillator - 3000 on-chip decaps.</td>
<td>45</td>
</tr>
<tr>
<td>4.7</td>
<td>Power spectral density of the ring oscillator supply net - 3000 on-chip decaps.</td>
<td>45</td>
</tr>
<tr>
<td>4.8</td>
<td>Power spectral density of the supply net for the circuit without the ring oscillator - 750 on-chip decaps.</td>
<td>46</td>
</tr>
<tr>
<td>4.9</td>
<td>Power spectral density of the ring oscillator supply net - 750 on-chip decaps.</td>
<td>46</td>
</tr>
</tbody>
</table>
# List of Tables

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>1.1</td>
<td>TDC requirements.</td>
</tr>
<tr>
<td>2.1</td>
<td>Ring oscillator corner analysis.</td>
</tr>
<tr>
<td>3.1</td>
<td>TDC Corner analysis.</td>
</tr>
<tr>
<td>3.2</td>
<td>LSB in all five corners.</td>
</tr>
<tr>
<td>3.3</td>
<td>TDC performance summary.</td>
</tr>
<tr>
<td>4.1</td>
<td>TDC power supply noise - 3000 on-chip decaps.</td>
</tr>
<tr>
<td>4.2</td>
<td>TDC power supply noise - 750 on-chip decaps.</td>
</tr>
</tbody>
</table>
The past years have experienced increasingly rapid advances in the field of nuclear imaging. The research in this domain explores different applications such as: Single-Photon-Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET). Other medical imaging techniques are: MRI (magnetic resonance imaging), tomography, ultrasound, etc. The nuclear medicine principle consists in injection of radioactive substances in the body in order to monitor and localize different types of cancers, neurological disease, heart disease, etc. In this situation, radionuclides and their location in the body can be identified by means of their emitted sub-nuclear particles, such as positrons, and photons. Through nuclear medicine, different disease can be discovered at the incipient phase. All these are a major interest within the field of medicine and use photodetectors to capture the emitted radiation but, there are also other fields in which low-light photosensors have a significant role such as: high energy physics, LiDAR (light imaging detection and ranging), biophotonics etc. A proper design of the photosensors, as well as their readout circuitry represents a major factor for data acquisition and interpretation. Depending on the application and its requirements, different photosensors with different characteristics are used. The next sub-chapters focus on PET and LiDAR systems as well as the photodetectors of choice for these applications.

1.1 PET Systems

1.1.1 Operating principle

PET is a nuclear imaging technique which detects pairs of gamma rays emitted by a positron-emitting radionuclide which is a tracer introduced in the body on a biologically active molecule, usually by intravenous injection. The system uses the decay characteristics of the tracer in order to localize the affected tissue. One of the most common active molecule is fluodeoxyglucose (FDG), which is an alternative of sugar and is widely used to detect and diagnose different types of tumors. After the tracer is absorbed in the body, it emits a positron. The positron travels a short distance, usually 1mm while losing kinetic energy until it interacts with an electron from the environment, producing gamma photons through annihilation (see Figure 1.1). These photons have a similar energy profile and geometric characteristics which offer information regarding the site of the affected tissue in the body. The gamma photons reach the photodetectors of PET scanners and trigger an electrical impulse which is further detected by different types of photodetectors such as PMTs (photomultiplier tubes) or silicon based photon sensors [2].
1.1.2 PET detectors

In order to achieve qualitative images, the detectors must have high efficiency for detecting 511-keV photons as well as very good spatial resolution. In order to detect 511-keV photons, usually scintillating crystals are used, in combination with photon-counting photodetectors. When a gamma ray hits a scintillator, a shower of 420-nm visible photons may be generated, which are readily detected by the photon-counting photodetector. Thus, the main feature of the photodetector is to be able to detect 420-nm photons individually or in clusters. Apart from this feature, the timing resolution is another fundamental property. It represents the changes in the arrival times for different events and it needs to be properly measured to clearly distinguish between true and false events. The true events occur when both resulting photons from the annihilation process are detected at the same time by photodetectors, and no other event is detected within the coincident timing window. PET applications require small, robust, low cost and tolerant to magnetic fields photosensors.[2] The next sections describe and compare different photodetectors for PET applications.

1.1.2.1 PMTs

PMTs have been used since their invention in the 1930s, especially for single-photon detection and photon-counting applications. The PMT consists of three major blocks: the photocathode, the dynode and the anode (see Figure 1.2). The photocathode is the place where the photons are converted into electrons by photoelectric effect and sent into the vacuum tube. Different photocathode materials are used so that different wavelengths can be detected. Once the electron is emitted, it reaches the dynode which multiplies it. The anode collects the multiplied secondary electrons emitted from the last dynode [3]. Large area PMTs have been the traditional PET detectors for a very long time, due to their stability, high gain and their fast output response. However, it is well known that the performance of PMTs is limited by their high sensitivity to magnetic fields and bulkiness.
An increased interest has been shown in the combination of PET and MRI systems. The first approach to cope with high magnetic fields of the MRI was to place the PMT outside the main magnetic field. In order to extract the scintillation photons out of the MRI bore before being detected by the photodetector, special techniques need to be used such as optical fiber coupling. As a consequence, the system will present signal losses and a reduction of the signal-to-noise ratio (SNR) \cite{4}. A better alternative of PMTs are the semiconductor based detectors which are presented next.

### 1.1.2.2 SiPMs

SiPMs (silicon photomultipliers) are a better alternative of PMTs which have been the sensor of choice for a long time. Due to their robustness, compactness, operation at low bias voltages, low noise (at the single photon level) and a very important characteristic, their insensitivity to magnetic fields, SiPMs are used in many fields, including nuclear medicine applications \cite{5},\cite{6}.

SiPMs are considered the suitable detectors for PET-MRI systems due to their insensitivity to magnetic fields and their compactness. Studies over the past decade have provided important developments regarding the SiPM PET scanners for small-animal and human imaging \cite{7},\cite{8},\cite{9}. The current visible trend leads towards fully integrated PET-MRI systems based on semiconductor PET detectors both clinically and preclinically. In the preclinical area, there are commercial sequential PET-MRIs available. The system combines high-resolution PET scanner with small animal MRI system. On the other side, the clinical PET-MRI systems were greatly influenced by preclinical ones and currently, sequential PET-MRIs are in use. Measurements and experiments have indicated that PET and MRI can function in this combination without degrading very much the system performance. However, extensive research is carried out regarding the integration of SiPMs in the PET detector module \cite{10}.

At the beginning, a great part of these detector modules were based on SiPM arrays with a total area of less than 20x20\(mm^2\) due to their high-noise characteristics and read-
out circuit complexity. Recently, the noise in SiPMs has been reduced and the photon detection efficiency was improved by different manufacturing techniques. This resulted into the fabrication of large-area SiPM arrays (>50x50mm²) consisting of 3x3mm² SiPM pixels [4]. Figure 1.3 illustrates the general structure of SiPMs as well as the fast output developed by SensL. There are two main types of SiPMs: analog (A-SiPM) and digital (D-SiPM). The analog SiPMs are formed by an array of APDs (avalanche photodiodes) or SPADs (single-photon avalanche diodes), whose output currents are summed in one node and the output result is analyzed using on-chip components (Figure 1.4) while the output of digital SiPMs is directly processed on-chip reducing the usage of external components as well as overcoming the timing performance (Figure 1.5).

The timing resolution for the single photon detection is limited by the SPAD jitter, circuit noise and routing skew. When photons reach the SiPM, the first photon can arrive at any location in the SPAD, thus the timing resolution is degraded due to the routing skew [2], unless measures are taken to homogenize the electrical path. The output of an analog SiPM is generally coupled either with a current amplifier or a
passive load and a voltage amplifier through a PAD and PCB interconnect. The output path generally has a large capacitance that impacts the timing performance of the SiPM [11]. To counter these effects, several front-end circuits have been proposed based on a number of techniques described in [12]. Even if the conventional D-SiPMs do not use off-chip extra components such as ADC, TDC, preamplifier, etc., compared to the A-SiPMs, the routing skew highly affects the timing resolution [2]. The implementation presented in [13] achieves balanced routing skew by using on-pixel TDC. Furthermore, the system is able to detect multiple timestamps in a single gamma event. By using an on-pixel TDC in order to detect at the same time multiple photons is a technique that reduces dramatically the fill factor of the sensor [5]. A better alternative to this issue is the Multichannel Digital SiPM which is described in the next paragraph.

### 1.1.3 Multichannel Digital Silicon Photomultipliers

A new approach of D-SiPM have been proposed in [5]. The principle consists of sharing one TDC with several pixels while not affecting the fill factor and improving the routing skew of the circuit. This type of SiPM is called the Multichannel D-SiPM (MD-SiPM). The project aimed to create a fully digital miniaturized bi-modal PET and ultrasound endoscopic probe, with 200ps resolution time of flight PET detector combined with a commercial ultrasound biopsy endoscope for pancreatic cancer clinical studies. The endoscopic sensor is a miniaturized array of 9x18 MD-SiPMs coupled to a mini-scintillator [2]. The proposed MD-SiPM is presented in Figure 1.6. The approach enables the use of multiple timestamps to reconstruct a timemark for the gamma events. Simulation results indicate that MD-SiPM is more tolerant to dark count rates (DCR) compared to those utilizing a single timestamp. The timing resolution is also significantly improved without degrading the fill-factor [5] by reducing the routing skews and using multiple timestamps.
1.2 Detectors in LiDAR applications

Apart from medical applications, a variety of photodetectors are used in LiDAR field as well. LiDAR is a measuring technique through which the distance to a target is determined by illuminating it with a light laser pulse and measuring the reflected light with a sensor. Differences in the detected light times and wavelengths are further used to make the digital 3D profile of the target. Its main use is to create high-resolution maps for a variety of applications such as: geography, forestry, military surveying, geology, etc. Different detectors have been used for LiDAR applications, such as SPADs, APDs, PIN photodiodes. However, the research to date has tended to focus on SiPM detectors and their advantages. Several studies thus far have presented a comparison between different types of detectors used for LiDAR applications as well as the benefits of SiPMs [14],[15],[16],[17].

In [14] a comparison of two LiDAR systems, one based on SiPMs detectors and the other one on APDs detectors is presented. The prototypes were characterized in terms of TOF (time-of-flight) measurements which have been performed both indoor and outdoor on different weather conditions. The experimental results have indicated a very good performance of SiPM LiDAR system which have been capable to see the target up to 360m, while the APD could not detect at 214m. Satisfactory results of SiPMs are present in the case of SNR measurements as well.

The effectiveness of high-sensitivity SensL’s SiPMs for LiDAR is presented in [16]. SensL SiPMs allow the use of low power lasers for eye-safety with increased ranging capabilities. SensL has created a ranging model to simulate a SiPM based ranging set-up with different conditions that can be analyzed with the LiDAR test bench. SensL’s ranging demonstrator consists of an engineering prototype which tests the
SiPM technology in different ranging applications using the results obtained from the ranging model. The model has been optimized for 100m targeting with SiPMs, resulting in 10cm resolution. As a result, the SiPM sensor presents great capabilities for ranging systems.

The work presented in [17] investigated the real capabilities of SiPMs for LiDAR systems through detecting signals in both analog and photon counting modes. A series of experiments have been carried out comparing SensL 30035 and Hamamatsu S10362-33-100C detectors with PMT Hamamatsu R7400P which is widely used in many LiDAR applications. The study was exploratory in nature and underlined the SiPM capabilities in this area.

1.3 Project requirements

The purpose of this thesis is to explore and develop a simple fully functional TDC in 350nm technology for analog SiPMs’ fast output developed by SensL. The purposed digital analog SiPM has a DCR better than $50\text{kcps/mm}^2$. The PDE at 420nm is 51% for an excess bias voltage of 6V with an overall fill factor of 75% and an area of $3\times3\text{mm}^2$, while the pitch of the SPAD microcells is 35μm. The TDC occupies a small area in the chip, representing a negligible loss of fill factor [11]. Defining the requirements is one of the most important steps in the project. The TDC requirements for this project are summarized in Table 1.1.

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>350nm</td>
</tr>
<tr>
<td>LSB</td>
<td>~50ps</td>
</tr>
<tr>
<td>Resolution</td>
<td>10 bits</td>
</tr>
<tr>
<td>Supply</td>
<td>3.3V</td>
</tr>
<tr>
<td>Clock</td>
<td>40MHz</td>
</tr>
<tr>
<td>Power</td>
<td>&lt;100mW</td>
</tr>
<tr>
<td>Dead time</td>
<td>&lt;5ns</td>
</tr>
</tbody>
</table>

1.4 Contributions

- implementation of a fully functional TDC with a resolution better than 70ps
- the implementation of a new alternative of the multi-path gated ring oscillator with double resolution and aprox. 17% area increase
- implementing the readout circuit (15bit serializer)
- corollary (but important) contribution: creating from scratch an entire library in Cadence Virtuoso for the TDC design
1.5 Thesis outline

The overall thesis takes the form of five chapters, including introduction and conclusion. In chapter 2, two TDC implementations are presented, with the advantages and disadvantages of each architecture. The final TDC design is characterized in detail in chapter 3 and also an alternative of it is introduced. In chapter 4, the design of the power distribution network of the TDC with decoupling capacitors is presented. In the end, chapter 6 presents the conclusion of the thesis.
The concept proposed in the thesis is shown in Figure 2.1. SensL has developed a unique model of SiPM by adding a third terminal, the fast output, in addition to the anode and cathode. The fast output is formed by the sum of all microcells as presented in Figure 2.1. The fast output signal can be used for ultra-fast timing measurements. Its extremely low capacitance (approx. 1pF compared to the anode or cathode which have a capacitance of approx. 100pF) makes it beneficial for the readout circuits [18].

The primary objective of this thesis is to present the implementation of the TDC which brings benefits to the entire system such as the reduction of internal parasitics, improvement of jitter, simplicity and compactness. Ideally, each SPAD has its own TDC (see Figure 2.2), but of course this is not possible. The desire is to combine each column of the SiPM with its own TDC (see Figure 2.3) but at this phase in the process, we decided to implement one TDC for the entire SiPM (see Figure 2.4).

This chapter describes two TDC topologies implemented in 350nm technology with their advantages and drawbacks, that were studied and simulated before choosing the final architecture. Simplicity and good functionality were the most important aspects taken into consideration for the TDC design.

### 2.1 Vernier Line

The first approach for the TDC design was the Vernier line. It has been demonstrated in many works that Vernier is capable of achieving sub-gate delay resolution only by using two delay lines, one for the start signal and one for the stop signal. As the range increases, the number of delay stages increases as well and this results in larger area. The Vernier delay line TDC concept is illustrated in Figure 2.5. The first-delay line has a slightly larger delay compared the second one. During the measurement the start signal propagates through the first delay line while the stop signal along the second one. Since the delay seen by the stop signal is smaller, at one point, the stop signal
"chases" the start signal (see Figure 2.6). Detectors, such as flip-flops, detect the point at which the start and stop signals are in phase. In general, the difference between these two signals can be made small so that the resolution does not depend on the delay of a single stage, but on the difference of two delay stages. The resolution of the
Vernier line is determined using the following formula [1]:

\[ T_{LSB} = t_{d1} - t_{d2}, \tag{2.1} \]

where \( t_{d1} \) is the propagation delay through the first Vernier stage and \( t_{d2} \) through the second one [1]. The required number of stages is [1]:

\[ N = T_{max}/T_{LSB}, \tag{2.2} \]

where \( T_{max} \) is the maximum propagation time through the line and \( T_{LSB} \) is the resolution.

The following sub-chapters describe the TDC design concept along with the simulation results.
2.1.1 Design

The Vernier line was combined together with a ring oscillator in order to achieve sub-gate delay resolution without increasing the area a lot. The working principle is presented in Figure 2.7. The output signal from the comparator (CMP OUT) represents the input signal of the ring oscillator which has an oscillation period $T_{osc1}$. At the first rising edge of the CMP OUT signal the ring starts oscillating and remains in this state as long as the comparator output is high. When the signal goes low, the ring will freeze when it reaches the first rising edge. The "slow" counter will keep track of the number of oscillations through the ring and will determine the coarse bits of the TDC, while the Vernier line determines the fine bits, in the form of a thermometer code. The Vernier line has two inputs: $V_{\text{start}}$ and $V_{\text{stop}}$. $V_{\text{start}}$ is the external stop signal while $V_{\text{stop}}$ is generated at the first rising edge of the oscillator after the arrival of $V_{\text{start}}$. There are 4 fine bits, therefore 16 stages will be used for the Vernier line. A conceptual diagram with the main block interconnects is presented in Figure 2.8.

For the case with an on-chip decoder, the input of the thermometer is constantly changing, resulting in high power consumption and the risk of glitches. These can be reduced by using a register at the input of the decoder. In this way, the thermometer input will change only once.

For long time intervals measurements, area constraints are of main concern for Vernier lines. In this situation, Vernier loop structures can be used. The delay lines for the start signal and for the stop signal are configured into a loop structure. Although this architecture might seem simple and trivial to implement, the loop structure represents the biggest drawback of the looped Vernier TDC. Compared to a conventional Vernier TDC, where the layout asymmetries and the non-linearities are very hard to control, in the case of two interconnected looped structures with matched delay elements, these are even more difficult to achieve. As a consequence, the linearity of a looped Vernier TDC is worse compared to the conventional implementation [1].
The Vernier topology has advantages such as: compactness and sub-gate delay resolution which can be easily obtained with an increase in the number of delay stages. However, there are certain drawbacks associated with the use of this architecture. Some of these are: mismatch between the Vernier delay stages, glitches in the thermometer-to-binary decoder and conventional time and latency which depend on the measurement interval and resolution [1]. Vernier lines are well known architectures with a very wide
application range, but another major problem is their high instability to PVT (process, voltage and temperature) variations and the limitation of the dynamic range with respect to the number of the delay stages [19]. The behavior of the Vernier line with the ring oscillator TDC architecture have been further investigated with the simulation results presented in the next sub-chapter.

2.1.2 Simulation results

The aforementioned implementation was simulated and analyzed at the schematic level in order to get deeper insight with respect to the behavior and performance of the circuit.

The ring oscillator along with the Vernier line have been analyzed as two separate entities in terms of oscillation period, power consumption and energy.

For the maximum number of cycles of the ring, in different corners, the settling time (the time needed for the ring to achieve normal oscillation) and the oscillation period have been measured (see Table 2.1). The worst settling time has been determined in the SF (slow-fast) corner, while as expected, the biggest oscillation period is obtained in the SS (slow-slow) corner.

Figure 2.9 illustrates the decrease in the oscillation period with the power supply. The maximum difference between the SS and FF (fast-fast) is 32.98ps. The measured average power of the ring oscillator for the maximum impulse width is 2.94mW corre-
Table 2.1: Ring oscillator corner analysis.

<table>
<thead>
<tr>
<th>Corner</th>
<th>Settling Time [ps]</th>
<th>Oscillation period [ps]</th>
</tr>
</thead>
<tbody>
<tr>
<td>TT</td>
<td>230</td>
<td>368.3</td>
</tr>
<tr>
<td>SS</td>
<td>230.8</td>
<td>379.6</td>
</tr>
<tr>
<td>FF</td>
<td>229.3</td>
<td>356.5</td>
</tr>
<tr>
<td>SF</td>
<td>232.8</td>
<td>366.1</td>
</tr>
<tr>
<td>FS</td>
<td>227.7</td>
<td>368.7</td>
</tr>
</tbody>
</table>

Figure 2.10: Operating principle of gated ring oscillator [1].

sponding to an energy of 88.23pJ/conversion. All the aforementioned results have been measured at the maximum number of cycles of the ring oscillator.

For the Vernier line, the DNL and INL were measured for all five corners with an average LSB of 39.33ps in TT (typical-typical). The measured average power of the Vernier line for the maximum impulse width is 741.78μW corresponding to an energy of 652.77fJ/conversion, much lower compared to the ring oscillator.

All the aspects presented in sub-section 2.1.1 (transistor mismatch, high instability to process variations, large area, glitches in the thermometer decoder) contributed to the decision of choosing a simpler and more robust architecture for the final TDC design which is introduced in the next sub-chapter. For this same reason, the analysis stopped at this stage where the ring and Vernier are not connected and analyzed together.

2.2 Multi-path gated ring oscillator

The gated ring oscillator (GRO) topology is different from a traditional ring oscillator by its property to freeze between two consecutive time measurements [20]. If instead of freezing, the system is reset between measurements, there is no possibility to use the previous state to improve the resolution by analyzing the quantization error. The operating principle is presented in the signal diagram Figure 2.10. Each signal represents a phase of a five stages ring oscillator, which means that the oscillation period
is ten times greater than the delay of a single stage. When the freeze signal is high, the ring stops oscillating, and when the freeze signal goes low the oscillation continues with the same phase. Since the next measurement continues from the previous state, the quantization error can be taken into consideration. The relationship between the input time interval $T_{\text{in}}$ and the output conversion result $N_{\text{out}}$ is given by the following formula [21]:

$$T_{\text{in}}[k] = N_{\text{out}}[k]T_0 + T_{\text{stop}}[k] - T_{\text{start}}[k],$$

(2.3)

where $T_0$ is the raw resolution of the TDC. As a result, the time residue at the end of one measurement will be transferred to the next one (see Figure 2.11). This can be expressed by [21]:

$$T_{\text{start}}[k] = T_{\text{stop}}[k - 1].$$

(2.4)

Based on the previously presented formulas, the quantization error of each conversion can be expressed by [21]:

$$T_{\text{error}}[k] = T_{\text{stop}}[k] - T_{\text{stop}}[k - 1].$$

(2.5)

As a consequence of this effect, a lower noise level is achieved through the first-order noise shaping effect. Noise shaping reduces low frequency noise components and shifts the quantization noise power towards higher frequencies. Equation (2.5) represents the time domain input-output relation of a first-order digital high-pass filter. Apart from this, the GRO structure increases the performance of the TDC by diminishing the mismatch errors of the delay elements which are also first-order shaped [21]. The GRO-TDC can achieve intrinsic scrambling of its quantization and mismatch error, as well as first-order noise shaping. In comparison with conventional GROs which offer noise shaping benefits but with the cost of the resolution being dependent on the delay per
stage, multi-path gated ring oscillators (MGROs) have a higher oscillation frequency, hence a smaller delay per stage and a higher raw resolution [21]. The reduction of the effective delay per stage has been previously presented in other works [23],[24],[25]. The difference between a traditional ring oscillator and a multi-path topology is presented in Figure 2.12. In a classic ring oscillator, the output of each inverter is connected to the input of the next one, while in the case of the MGRO, the inverter can have multiple inputs, one connected to the next inverter and another one to another inverter in the ring. There is also another possibility to reduce the effective delay per stage by using N coupled GRO. In this case the stage delay is reduced by a factor of N, where N represents the number of coupled GRO. The main issue is to initialize and maintain a well-defined oscillation through gating operation, since the ring freezes and starts from the previous state without reset[22]. The MGRO is a better alternative of this approach. The next sub-chapters describe the design and implementation of the MGRO TDC.

2.2.1 General concept

The TDC architecture comprises a ring oscillator, a 6bit counter and 9 phase detectors. Each stage of the MGRO consists of a tri-state inverter with three inputs (three parallel simple inverters of the same size). The three separate inputs of each stage are connected to different places along the ring (see Figure 2.13). This type of connection helps the output of a stage to start transitioning ahead of time and therefore, the delay per stage is significantly reduced. The decision of using tri-state inverters with three inputs limits the minimum number of delay stages in the ring to nine. As a consequence, 18 phase states are used for the fine bits. An oscillation period is completed only after the signal passes through all the nine inverters twice (see Figure 2.14), hence the 18 factor (see Figure 2.15).

The 18 states can be represented on 5bits after decoding, which together with the 6bits from the counter create a 10bit result, with redundancy. In this case, no on-chip
decoder is used and all 15 bits (9 bit code word+6 binary coarse bits) are transferred through a serializer. The final result is calculated as:

\[ N_{\text{result}} = 18 \cdot N_{\text{coarse}} + N_{\text{fine}}, \] (2.6)

where \( N_{\text{coarse}} \) is the counter value and \( N_{\text{fine}} \) is the decoded fine bits value. A timing diagram is presented as well, for a clear overview of the TDC operating principle (see Figure 2.16). The ring starts oscillating only when EN (enable signal) is high. At the falling edge of EN the ring freezes in its current state which is then saved in a register. This value needs to be decoded in order to extract the fine bits (right now, off-chip decoding). A counter is used to keep track of the number of oscillations through the ring in order to determine the coarse bits.

2.2.2 Advantages and drawbacks

This topology presents several advantages such as smaller area compared to other TDC architectures while still getting a very good resolution in this technology. The simplicity is the key aspect of this design and it is highlighted by the fact of using only a ring oscillator with nine delay stages and no other additional calibration circuits in order to get a very good resolution (better than 70ps in typical corner). The most difficult part of this design is keeping symmetric delay lines in the ring as the nonlinearities and the LSB are dependent on the layout structure. As a consequence, layout is the cumbersome part of this design.
Figure 2.15: MGRO phases - detailed diagram.
Figure 2.16: TDC timing diagram.
The MGRO architecture was analyzed at the schematic level as well as post-layout in Cadence Virtuoso and LTSpice. The following sub-chapters describe in detail all the results.

3.1 Schematic level TDC characterization

The schematic level analysis is the first step towards understanding the MGRO TDC behavior. A corner analysis has been performed and the LSB was determined for all five cases (see Table 3.1). However, a Monte Carlo simulation is more accurate and closer to the real case. The Monte Carlo simulation (423 iterations) has indicated a maximum resolution of 34.68\text{ps} and a minimum resolution of 32.80\text{ps}.

Figure 3.1 illustrates the change of the oscillation period due to temperature variations in the circuit. The maximum oscillation period is 761.72\text{ps} with a resolution of 42.31\text{ps}, while the minimum oscillation period is 528.97\text{ps} with 23.38\text{ps} resolution. This translates in a shift of \(1.41\text{ps}/^\circ\text{C}\). The dependence of the oscillation period with power supply voltage is presented in Figure 3.2. The expected decreasing trend is visible, with a maximum variation between FF and SS of 182\text{ps}.

Through careful sizing of the transistors in the stage, the transfer characteristics can be adjusted, so as to obtain a highly linear behavior, as shown in the DNL and INL simulation results from Figure 3.3. In the worst case scenario, corresponding to the FS corner, the DNL does not exceed \(\pm0.5\text{LSB}\) while the INL is \(\pm1\text{LSB}\). The constant increase in the INL is a consequence of the oscillation period not being exactly equal to 18 LSBs.

3.1.1 Methods of phase noise reduction in ring oscillators

In general, ring oscillators which are placed close to each other and operate at close frequencies will present phase and frequency variations. Due to different noise sources

<table>
<thead>
<tr>
<th>Power supply 3.3V</th>
<th>Corner</th>
<th>LSB [ps]</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>TT</td>
<td>34.36</td>
</tr>
<tr>
<td></td>
<td>SS</td>
<td>37.88</td>
</tr>
<tr>
<td></td>
<td>FF</td>
<td>30.44</td>
</tr>
<tr>
<td></td>
<td>SF</td>
<td>34.89</td>
</tr>
<tr>
<td></td>
<td>FS</td>
<td>33.23</td>
</tr>
</tbody>
</table>
present in a system, any practical oscillator has jitter accumulation which can be translated into the frequency domain as phase noise [26].

Recent studies [27] have demonstrated that two different phase noise suppression techniques, injection locking and coupling can be used together in order to significantly reduce the PVT variations, while providing noise filtering to reduce the accumulated jitter. In the case of a large number of pixels on the sensor, the sensor uniformity and
timing information is very difficult to achieve, especially for long time measurements
due to temperature and voltage variations across the chip. However, there are a few
solutions available for robust TDCs such as PLLs that keep a constant oscillation
frequency and reduce PVT variations. The main drawback of this approach is that
the power consumption and non-linearities arise from the distribution network. A
second solution is to use an on-pixel TDC, but at the cost of limited synchronization
between TDCs due to jitter, PVT variations and small fill factor. The work presented
in [27] propose the use of a local reference with minimum power consumption and
complexity, which is independent on chip variations. The presented evidence suggests
that through simple and very robust mechanisms the synchronization of multiple TDCs
has been implemented so that even at very large frequency variations, the system
provides high quality reference signal and phase locking. Multiple oscillators have
been coupled through their neighbors through resistive elements, which in this case
are the key components of coupling. It has been shown that at the beginning of the
operation, each oscillator oscillates with slightly different frequencies and phases, but
after coupling, phase alignment is reached. As a consequence, when the phase of one
oscillator is disturbed by different factors, its neighbors help it to come back to its initial
oscillation frequency without propagating the error. Figure 3.4 depicts the coupling
architecture principle as well as the misalignment self correction of the coupled ring
oscillators. Measurement results for an array of 8x8 coupled ring oscillators indicated
a 18dB phase noise and 14dB phase jitter improvement, under 200µW/GHz power
efficiency [27]. The same principle can be applied for the case of multiple TDCs, each
of them for a column of SiPMs, as it is desired. At this moment, a single TDC is
placed on chip. However, the injection locking and coupling effect have been applied
in the case of MGRO but not in the same manner as explained above. The research
aimed to study the MGRO behavior to these effects. The conceptual plan as well as
the implemented techniques are introduced and explained in detail in the following
sub-chapters.
3.1.1.1 Injection locking

Major attention has been devoted to the impact of injection locking in ring oscillators [28],[26],[29],[30]. The method consists of injecting a signal from a separate source with better phase noise characteristics (such as a PLL) in the ring which causes a shift in the oscillation frequency from the free-running value to the n-th harmonic of the injected one. The principle is illustrated in Figure 3.5. The injection locking technique was applied to the MGRO by itself, without any additional circuitry. The injection is controlled by a voltage source which sends a narrow train of impulses through three transistors that short circuit the inputs of two consecutive stages as depicted in Figure 3.6. A sinusoidal signal of various frequencies and white noise was superimposed on the power supply. The sinusoidal frequencies are close to the free running frequency of the ring oscillator while the amplitude of the signal is 100mV which causes approximately a 1LSB variation in the oscillation period. Figure 3.7 presents the effects of the sinusoidal noise superimposed on the power supply. The ring is locked at 580ps (1.724GHz).

Apart from the technique presented before, another approach of locking was tested for the MGRO architecture. In this case, the injection controlling source sent a narrow train of impulses through a single transistor (minimum transistor size) shorting a single phase Figure 3.8. This comes with the advantage of having a faster oscillation period due to the decrease in the load of the ring. A 660mV white noise was superimposed...
on the power supply. The carrier frequency is 1.67GHz. The signal was injected at the third and fourth sub-harmonic, but the latter did not result in a phase noise decrease (Figure 3.9) due to insufficient power.

3.1.1.2 Coupling effect

Jitter accumulation in ring oscillators can also be mitigated through coupling. This topic is addressed in this work as well, albeit not in detail, to provide a base for future implementations where multiple TDCs might be used, each dedicated to a column of SiPMs. All the simulations were run at schematic level using the LTSpice simulator.

Every two oscillators are coupled to one another through a $\pi$ network, for which the resistor and capacitances were calculated taking into consideration that all the oscillators will be connected in a line (see Figure 3.10). Assuming 16 TDCs in total, with a maximum delay between the first and the last of 1ns, the final values are $2.44k\Omega$
and 1.82fF. A white noise source was superimposed on the power supply. The white noise has an amplitude of 330mV, which represents 10% of Vdd and a bandwidth of 2.5GHz, while the free running frequency of the oscillator is only 1.61GHz. Figure 3.11 shows the coupling of three MGROs. Three MGROs were connected to each other using the π network and ideal switches which allowed the coupling to be turned on and off. At the beginning, when disconnected, all three oscillate with different frequencies, but as soon as the coupling is turned on, they synchronize and maintain the same oscillation period until they are released again. The differences between MGROs were obtained by changing the transistors’ parameters similar to a Monte Carlo analysis. Because only three oscillators were coupled, the phase noise simulations indicate a small change.
between the coupled and uncoupled situations (see Figure 3.12). In the case of more coupled oscillators, due to the uncorrelated phase noise for different oscillators, the system noise is reduced with $10\log_{10}N$ [27], where $N$ is the number of coupled oscillators.
As a conclusion, the literature demonstrated the positive effect of injection locking and coupling on phase noise reduction. However, because the aforementioned techniques require a relatively large settling time and the proposed operation of the TDC architecture has the ring freeze after every measurement, they cannot be applied in this case. This study attempted to underline phase noise mitigation techniques for MGRO TDC which can be used in future implementations.
The next sub-chapter intends to examine in detail the behavior of the delay stage (tri-state inverter with three inputs) of the MGRO, since it is the key component of the ring oscillator.

### 3.1.2 Delay stage characterization

The delay stage is the most important component of the TDC architecture. In the case of MGRO, it consists of three inverters connected in parallel and with tri-state operation. The underlying concept has the great advantage of significantly reducing the delay per stage, hence an increase in oscillation frequency is achieved.

For a better understanding of the operating principle and the relationship between the three inputs and the delay time, a skew analysis is performed on a single three input delay stage with minimum load. For this purpose, the skews of the three inputs were varied between 0 and 140ps and the fall time was measured for each section of the output. The sum of all section fall times is approximately equal to the falling time of the TDC output (see Figure 3.13). The same principle is valid for the rising time of the oscillator. The dependence between the skews and the rise and fall times is depicted in Figure 3.14 where all the points in one band are at most 10ps apart from each other. The red dot indicates the state in which the ring oscillates during normal operation.

It is interesting to translate the dependency between the delay time and skew into a relationship between this delay and the actual wires that are implemented in layout because the input skew is given by the RC delay through them. A deeper understanding of the RC delay requires a proper analysis of the electrical wire. Parasitic components across the wires have a great impact on the circuit behavior and so different wire models
The chosen model for estimating the parasitic capacitance is the parallel-plate model with fringing capacitance (see Figure 3.15, where $W$ and $L$ are the width and length of the wire, $d$ is the thickness of the dielectric layer and $h$ is the height). For $W/H$ smaller than 1.5 the fringing capacitance is not negligible anymore (in this case $W/H = 0.8$) and it can increase the overall capacitance by a factor of more than 10 for small line widths [31]. The wire capacitance can be determined using (3.1) [31], where $d$ and $\epsilon_{di}$ are the thickness and the permittivity of the dielectric layer ($\text{SiO}_2$) and $w = W - \frac{h}{2}$ is a good approximation for the width of the parallel-plate capacitor.

$$C_{\text{wire}} = C_{\text{pp}} + C_{\text{fringe}} = \left(\frac{w \cdot \epsilon_{di}}{d} + \frac{2 \cdot \pi \cdot \epsilon_{di}}{\log \frac{d}{h}}\right) \cdot L. \quad (3.1)$$
The wire resistance is determined using (3.2) [31].

\[ R_{\text{wire}} = R_0 \cdot \frac{L}{W}. \]  

(3.2)

The distributed RC model is presented in Figure 3.16. Based on it, the total propagation delay of the network can be estimated by applying the Elmore delay formula [31].

\[ \tau = R_o \cdot \left( \frac{C_w}{2} + \frac{C'_w}{2} + \frac{C''_w}{2} \right) + \left( \frac{C_w}{2} + C_{\text{in}} \right) \cdot (R_o + R_w). \]  

(3.3)

The dependence of the wire length from the connection delay is then expressed by (3.4) and depicted in Figure 3.17.

\[ \tau = L^2 \cdot 7.940 \cdot 10^{-5} + L \cdot 1.041 \cdot 10^{-5} + 7.375 \cdot 10^3 \cdot C'_w + 7.375 \cdot 10^3 \cdot C''_w. \]  

(3.4)

It is now theoretically possible to determine the maximum admissible difference in length between the metal wires at the inputs for a desired skew. Unfortunately, in the layout implementation, additional elements intervene that alter the delay through the interconnect wires such as vias, mutual coupling between nets and the presence of different materials, apart from SiO$_2$, that have different permittivities. For these reasons, such a design method can only be used to provide rough guidelines when implementing the TDC layout.

In the pages that follow, the TDC layout and post-layout simulations are described and analyzed in detail.
3.2 Layout and post-layout TDC characterization

The previous schematic analysis does not coincide with the real case because no parasitic components are taken into consideration, but the results outlined the TDC behavior indicating the great potential of this architecture. The post-layout simulation results are introduced after the layout of the core components of the TDC is briefly described.

The main block of this architecture is the MGRO. Different ring variants have been implemented, trying to create a constant load for each delay stage. The layout of the ring is the most cumbersome part due to the delay stages connection complexity. Since this is not an usual ring oscillator with simple inverters with one input and one output, one of the main obstacles was to connect all the stages in such a way as to keep a constant length between the ring nodes. To adjust the MGRO delay, hence the resolution, the power supply of the ring is separated from the power supply of the rest of the circuit. Different ring layouts have been designed but only two of them have proven to be the best of the lot.

In the first approach, all the delay stages were placed in a classic circle manner. The main advantage is that all the three inputs of the inverter arrive in the same order. There are certain drawbacks associated with the use of this topology. One of them is that the tri-state inverters do not have the same load and the other one is that the area is too large, therefore the oscillation frequency is lower which results in a smaller
resolution. Also, the orientation of the tri-state inverters is not the same for all of them which comes with the disadvantage of significant influence on their operation due to the thermal variations and gradients in the circuit (see Figure 3.18). The main focus was kept on carefully sizing the connections by using two different layers, metal1 and metal3. The benefit of this approach is that the parasitic capacitances between the lines are reduced since the distance between metal1 and metal3 is greater compared to metal1 and metal2. However, it was not possible to assess the primary objectives (constant load and good resolution) within this design, therefore, a different design has been implemented.

In the second approach, the delay elements were placed in a line, one after the other, so that a constant load is kept for each delay stage (all the delay stages are connected through lines of the same length and width). The same technique of using different metal layers for the interconnections was used (see Figure 3.19). This design is very compact and optimized for small area and low parasitic capacitances as well. There is a significant difference between the two designs. The most remarkable one is the balanced load between delay stages of the second approach. Considering the two implementations with their advantages and drawbacks, it is clear that the second approach is the design of choice for the TDC.

The rest of the components such as the counter, tri-state inverters and phase registers were optimized for small parasitic capacitances and small area. The remaining part of this sub-chapter presents and analyzes the post-layout simulation results of the entire TDC.

The DNL and INL post-layout simulations results are presented in Figure 3.20. The
simulations were run for the entire range of the TDC with 5ps increments in the input pulse width, by keeping the power supply of the TDC and ring at 3.3V. For TT corner, the DNL is ±0.55LSB, while INL is ±1LSB. As expected, the smaller nonlinearities are presented in the FF corner, but this situation is far from the real case. Closer to reality are the FS and SF corner simulations where the DNL and INL are +1.28/-1LSB, +2.21/-1.7LSB and +0.93/-1LSB, +1.15/-1.42LSB. However, even if the nonlinearities exceed the usual range, the presented data is obtained without any TDC calibration. A possible calibration for the INL can be realized using a LUT (look-up table) and/or by manipulating the Vdd and Vdd ring values.

The TDC resolution can be adjusted by acting on the ring oscillator power supply. The LSB is estimated by sweeping the input pulse width with a 5ps step. For this reason, a fitting curve is plotted as well in order to emphasize the trend of the results. As shown in Figure 3.21 an increase in the Vdd causes an increase in resolution until one point where the resolution becomes constant. This indicates that the LSB cannot go below 55ps even with an increase in the power supply. Table 3.2 summarizes the LSB results in all five corners.

One of the most important analysis is the LSB shift with temperature variations since the temperature on the chip will not be constant and will have a great impact on the system performance. The simulation results are depicted in Figure 3.22 and indicate an increase of the LSB with temperature, the reason being that the effect of parasitic components becomes dominant and decreases the oscillation frequency of the ring at high temperatures. By comparing the two figures (Figure 3.21 and Figure 3.22), it can be seen that a compensation for the LSB at higher temperature is not only possible, but also advisable.

In conclusion, the simulation results of the TDC with the MGRO indicate a very
robust architecture, with the capability to obtain a LSB better than 70ps without any calibration or additional circuitry. The only feature that allows small adjustments to the circuit is the separate power supply of the ring oscillator. The TDC performance is summarized in Table 3.3.

The next sub-chapter provides insights into a new TDC architecture, based on the current implementation and the conceptual working principle as well as the benefits of
3.3 Anti-phased TDC

The present research explores an alternative of the current TDC which is able to achieve half of LSB only by introducing another set of latches in the design. The conceptual
### Table 3.3: TDC performance summary.

<table>
<thead>
<tr>
<th>Performance</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>LSB (std/anti-phase)</td>
<td>64.5 / 35 ps</td>
</tr>
<tr>
<td>Jitter</td>
<td>&lt;18ps (est.)</td>
</tr>
<tr>
<td>DNL/INL</td>
<td>+/-.05 +/-.1LSB</td>
</tr>
<tr>
<td>Resolution</td>
<td>10bits</td>
</tr>
<tr>
<td>Supply</td>
<td>3.3V</td>
</tr>
<tr>
<td>Input</td>
<td>Single-ended</td>
</tr>
<tr>
<td>Output</td>
<td>125Mbps</td>
</tr>
<tr>
<td>Clock</td>
<td>40MHz</td>
</tr>
<tr>
<td>Power (peak/stand-by)</td>
<td>&lt;9mW / &lt;1mW</td>
</tr>
<tr>
<td>Area</td>
<td>24238µm²</td>
</tr>
</tbody>
</table>

Figure 3.23: Anti-phased TDC block diagram.

The diagram of this architecture is illustrated in Figure 3.23. The design comprises the same blocks presented for the MGRO TDC except of two delay elements used in order to create a delay between phases equal to half of LSB and another set of phase registers. At this moment, the delay elements are implemented with starved cells. Their delays are changed independently by controlling four bias voltages (two for the NMOS transistors and two for the PMOS transistors, see Figure 3.24) so that the falling time and the rising time are adjusted in order to create a 0.5LSB delay. Since this is not a very robust implementation, in the future, the starved cells will be replaced with other delay circuits with a fixed delay.
Figure 3.24: Starved cell with four controllable bias voltages.

Figure 3.25: Anti-phase TDC timing diagram.

Figure 3.25 presents an overview of the anti-phased TDC timing behavior. Compared to the classic implementation of MGRO TDC, instead of a single set of latches, in this case, two sets are connected to the MGRO. At the falling edge of the input signal, the oscillation stops and the ring phases are latched. Because of the 0.5LSB delay between them, they can either be identical or differ by at most one state, which can be used to effectively double the resolution.

The readout circuit is a 24-bit serializer (18 fine bits and 6 coarse bits).
Preliminary schematic level simulation results indicate an LSB of aprox. 14ps which represents half an LSB of the classic MGRO TDC (no parasitic components included in these simulations). However, preliminary layout design indicates an area increase of only 17% compared to the MGRO TDC. The results of this study indicate a great potential for the anti-phased TDC, but further research needs to be conducted to determine the effectiveness of this implementation.
Decoupling and power supply noise suppression

Power dissipation plays an important role in addressing the issue of noise power constraints for modern CMOS circuits. The design of the power distribution network has become a very complex process and requires careful modeling in order to ensure the good functionality of the circuit [32]. The rapid phenomenon of device scaling as well as the increase in the circuits complexity play a critical role in the noise effect on high speed integrated circuits. Also, the parasitic impedances of the on-chip interconnects have dramatically increased.

Changes in power supply are translated as power supply noise which degrades circuit operation. High average currents cause large ohmic IR voltage drops, while the high speed switching for small transistors cause current transients which result in large inductive voltage drops equal to $L\frac{di}{dt}$. Unfortunately, the decrease in IC technology comes with the price of larger average and transient currents [33]. The main role of the power distribution network is to minimize the voltage drops while keeping the supply within a certain range (noise margins) [33] and a good model of the load circuit is necessary to ensure a correct design.

The general model of a non ideal power delivery network is presented in Figure 4.1. The system consists of a power supply and a power load whose interconnect lines are not ideal. The power supply is assumed to have an ideal voltage source and ground voltage levels while the interconnect lines have finite parasitic capacitances and parasitic inductances. The ohmic voltage drops and inductive voltage drops are formed across the parasitic components as the load $I(t)$ draws current from the network, therefore, the voltage levels across the load are not ideal anymore [33].

The frequently used technique for reducing power supply noise by lowering the impedance of the power distribution system is placing decoupling capacitors (decaps)
between the supply lines. Both on-chip and off-chip decoupling capacitors have been used for a long time. Their role is to shunt high-frequency noise from the power supply to the ground or the other way around. There are different methods of designing decoupling capacitors and each of them depends on the technology and circuit constraints. The two main parameters that need to be weighed when designing decoupling capacitors are the area constraints and leakage power consumption [34]. Based on the research about on-chip decoupling capacitors presented in [34], the pMOS decap was chosen for the TDC circuit, mainly because it is a good tradeoff between area efficiency and dielectric leakage power (see Figure 4.2). The effective capacitance of the pMOS decap was determined in post-layout simulation by placing a sinusoidal AC test voltage on a 3.3V DC supply across the decap. The effective capacitance was calculated using (4.1) [34], where $f$ is the frequency of the test signal, $\theta$ is the phase difference between the voltage and current through the decap, $|V|$ and $|I|$ are the magnitudes of the AC voltage and current of the decap. The effective capacitance of a pMOS decoupling capacitor with 700nm length and 2µm wide is 8fF.

\[
C_{\text{eff}} = \frac{1}{2 \cdot \pi \cdot f \cdot |Z| \cdot \sin \theta} = \frac{|I|}{2 \cdot \pi \cdot f \cdot |V| \cdot \sin \theta}. \tag{4.1}
\]

The required decoupling capacitance in order to maintain the power supply fluctuations within a certain range was determined using (4.2) [35], where $V_{dd}$ is the power supply voltage, $n$ is the ripple voltage amplitude as a percentage of $V_{dd}$, $I$ is the average current which can be determined from the average power and $f_c$ is the clock frequency. Because the TDC circuit has two different power supplies, an independent one for the ring oscillator and another one for the rest of the circuit, two decoupling capacitances were calculated. The decoupling capacitance of the ring, for a 3% ripple voltage is 1.95pF and for the rest of the circuit 6.05pF. Taking into account these values as well as the effective capacitance of the pMOS decoupling capacitor, the total number of on-chip decaps has been determined. The results indicated that 244 decaps are necessary for the power supply of the ring and 757 decaps for the power supply of the rest of the circuit.

\[
C = \frac{I}{2 \cdot f_c \cdot V_{dd} \cdot n}. \tag{4.2}
\]

With the effective capacitance, as well as the decoupling capacitances calculated, the next step in the process consists in designing the power distribution network. The electrical characteristics of the interconnect play an important role in the circuit design.
The proposed model for the TDC is presented in Figure 4.3. The parasitic capacitance was measured between each power supply net and ground through post-layout simulations. Considering 8fF the effective capacitance of the decoupling capacitor and 3000 decaps results in a total decoupling capacitance of 24pF for each power supply. The trace resistance was calculated considering a metal3 wire with a sheet resistance of 99m$\Omega$/square (taken from the technology files), with a width of 1.65$\mu$m and 500$\mu$m length resulting in a trace resistance of 30$. The trace inductance at low frequencies can be described by (4.3) [33], where $T$ an $W$ are the thickness and width of the line, and $l$ is the length of the line in meters. The $\ln\gamma$ term is a function of the $T/W$ ratio and is very small compared to the other terms (varying from 0 to 0.0025), with a negligible effect [33]. The resulting trace inductance value is 656.61pH.

\[
L = 0.2 \cdot l \cdot \left( \ln \frac{2 \cdot l}{T + W} + \frac{1}{2} - \ln\gamma \right) \mu H. \tag{4.3}
\]

The bond wire inductance is usually 1nH/mm and this assumption was made in the current situation as well, for a 5mm long bond wire. The inductances, resistances and capacitances of the package cannot be neglected. For this situation, a QFN package was considered because it occupies small area and has the smallest parasitic capacitances on the pins (C=0.2pF). The PCB traces were assumed to be 3mm long because the decoupling capacitors are very close to the IC pad and 0.3mm wide. The PCB inductance value was calculated with a common PCB trace inductance estimating tool such as [36] and is equal to 5nH.

In order to reduce the impedance at low frequencies, it is necessary to use board capacitors. MLCC capacitors are frequently used in this situation and for low target impedance, multiple capacitors with different self-resonant frequencies are placed in parallel. For this purpose, four parallel board decaps of 100nF, 10nF, 1nF and 100pF are used in the model. The decaps values descend by an order of magnitude in order to avoid anti-resonance peaks. Their characteristics, such as ESR and ESL, were calculated with SimSurfing [37], a tool which lists the characteristics of Murata products.

Once the entire model of the power distribution network was designed, the target impedance was derived from (4.4) [33], where $V_{dd}$ is the power supply voltage, $\text{ripple}$ is
the admissible percentage of voltage ripple and $I_{load}$ is the load current.

$$Z_{target} = \frac{V_{dd} \cdot \text{ripple}}{I_{load}}.$$  \hfill (4.4) 

In this situation, two target impedances were calculated: the one for the power supply of the ring is $299.72\,\Omega$ and for the rest of the circuit $208.08\,\Omega$. The main goal is to keep the impedance of the power distribution network below these target values. In this regard, the impedance of the circuit was simulated for different numbers of on-chip decoupling capacitors and it is illustrated in Figure 4.4. According to these results, the minimum number of necessary on-chip decaps is 750, because at 500MHz the circuit impedance exceeds the target impedance for 300 on-chip decaps. In the case of the ring power supply, Figure 4.5 indicates that a number of 300 on-chip decaps is enough, however, at 500MHz the impedance of the circuit is very close to the target impedance. For this reason, a number of 750 decaps is a better choice. However, because there was enough silicon area available, the total number of decoupling capacitors was chosen to be 3000 for each supply net, which follows the rule of thumb that $Area_{decap} \approx 3 \times Area_{circuit}$.

To evaluate the performance of the designed power distribution network, the power spectral density (PSD) was measured based on the simulation results (see Figure 4.6 and Figure 4.7).

These simulations illustrate a decrease of around 20dB at 800MHz (close to the oscillation frequency) in PSD when using decoupling caps at room temperature. Based on these results, the noise voltage RMS values as well as the average power were computed and are presented in Table 4.1.

The same simulations were repeated for the case of 750 on-chip decaps since the target impedance simulations indicate that a number of 750 on-chip decaps for each supply net is sufficient. The PSD simulations are presented in Figure 4.8 and Figure 4.9, as well as the noise voltage RMS values are reported in Table 4.2.
Figure 4.5: Circuit impedance for different number of on-chip decoupling caps for the power supply of the MGRO.

Figure 4.6: Power spectral density of the supply net for the circuit without the ring oscillator - 3000 on-chip decaps.

Figure 4.7: Power spectral density of the ring oscillator supply net - 3000 on-chip decaps.
Table 4.1: TDC power supply noise - 3000 on-chip decaps.

<table>
<thead>
<tr>
<th>No decaps</th>
<th>Vdd</th>
<th>Vn (rms) [mV]</th>
<th>Vdd RING</th>
<th>Vn (rms) [mV]</th>
</tr>
</thead>
<tbody>
<tr>
<td>8.5</td>
<td>104.6</td>
<td>4.7</td>
<td>107.2</td>
<td></td>
</tr>
<tr>
<td>With decaps</td>
<td>5.6</td>
<td>62.3</td>
<td>1.75</td>
<td>52</td>
</tr>
</tbody>
</table>

Table 4.2: TDC power supply noise - 750 on-chip decaps.

<table>
<thead>
<tr>
<th>No decaps</th>
<th>Vdd</th>
<th>Vn (rms) [mV]</th>
<th>Vdd RING</th>
<th>Vn (rms) [mV]</th>
</tr>
</thead>
<tbody>
<tr>
<td>8.5</td>
<td>104.6</td>
<td>4.7</td>
<td>107.2</td>
<td></td>
</tr>
<tr>
<td>With decaps</td>
<td>9.27</td>
<td>84.9</td>
<td>3.56</td>
<td>75.25</td>
</tr>
</tbody>
</table>

Figure 4.8: Power spectral density of the supply net for the circuit without the ring oscillator - 750 on-chip decaps.

Figure 4.9: Power spectral density of the ring oscillator supply net - 750 on-chip decaps.
The aim of the present work was to design and evaluate a fully functional TDC implemented in 350nm technology for analog SiPMs. The TDC serves as a first step in the digitization of an analog SiPM’s fast output on chip by reducing the capacitive load and improving the overall timing performance. In addition, the system has the benefits of being backwards compatible, compact and simple. The current implementation indicates an LSB of 65ps in typical corner with a DNL of $\pm 0.55\text{LSB}$ and an INL of $\pm 1\text{LSB}$.

5.1 Future work

This research serves as a base for future concepts and improvements of the TDC design. One of them, the anti-phased TDC, has already been introduced in this work and it underlines the possibility to achieve a better resolution (around 35ps) in this technology. It is not possible to assess its full potential yet, therefore future investigations need to be carried out in order to determine its efficacy.

The current TDC implementation can be optimized by introducing different calibration circuits such as an external frequency control feedback loop via PLL to stabilize its oscillation frequency or adjustable biasing circuits so that the delay at each stage can be controlled. However, the necessity of these additional circuits will be determined in the future based on the circuit performance.

At this moment, a single TDC is used for the entire SiPM chip, but the desire is to use a TDC for each column of SiPMs. In this case, future research should therefore focus on the investigation of the mutual interaction between ring oscillators as well as noise suppression techniques such as coupling and injection locking which have already been presented in Chapter 3.
Bibliography


