# DLL Based Single Slope ADC For CMOS Image Sensor Column Readout

Jia Guo (4038614)

Supervisor: prof. dr. ir. Albert J.P. Theuwissen

Submission to

The Faculty of Electrical Engineering,

Mathematics and Computer Science

In Partial Fulfillment of the Requirements

For the Degree of

MASTER OF SCIENCE

In Electrical Engineering

Delft University of Technology

August 2011





## COMMITTEE MEMBERS:

prof. dr. ir. Albert J.P. Theuwissen

prof. dr. ir. Ronald Dekker

prof. dr. Paddy French

- dr. ir. Michiel Pertijs
- dr. Munir Abdalla Mohamed

#### ACKNOWLEDGMENTS

Since I left my country to study abroad, two years almost passed. I spend the first year at Delft, the Netherlands, for the courses studying. The second year I moved to Leuven, Belgium, for my master thesis. I had a happy life in the past two years, even though it was full of challenges. Here I would like to acknowledge a lot of people, who accompanied me and shared my happiness, and especially those people who offered helps and encouraged me when I felt lost in my difficult times.

First and foremost, I want to express my sincere gratitude to prof. dr. ir. Albert J.P. Theuwissen, my supervisor at TU Delft for his guidance and support of my thesis. Most importantly, he is the tutor that introduced me to the wonderful world of CMOS image sensors.

I would also like to thank Munir Abdalla Mohamed, my daily supervisor at imec for his continuous support of my master study and research, and also for this interesting topic proposed by him. I am happy to see that he has got recovered recently from a surgery, and now he is back to us.

In addition, I would like to give my thanks to the group leader Francesco Cannillo who behaved as my temporary daily supervisor during the period when Munir was in hospital. His broad knowledge left me a deep impression, and also his hard working sprit inspired me.

Lot of thanks are given to David San Segundo Bello. He is so nice and always willing to help. Thanks for his patient guidance and elaborate explanations. Every time I see him I would remind myself "Always give people more than they expect."

I would also like to acknowledge Srinjoy Mitra, Nick van Helleputte, and all the other engineers in the group, for sharing their knowledge and offering help. My thanks are also given to Youngcheol Chae and Yue Chen from Electronic Instrumentation Lab at TU Delft. I used to have short conversations with them when I was in the Netherlands. Even though what they said was difficult for me to understand then, it is proved to be quite useful later when I get a better understanding.

In addition, I want to thank all my friends in Belgium and the Netherlands: Cheng Ma, Lu Zhang, Guanyu Yi, Xiaoqiang Zhang, Zhichao Lu, Song Liu and Yan Li for the happy life with them.

My deepest gratitude is given to my beloved family, my parents, for their support, understanding and of course the enormous love. Apparently, words are far not enough to express my gratitude, and English is a foreign language for them which means they can't understand this paragraph. Whatever, here I would like to express my love to them, and I will use my actions to prove it in my life.

## ABSTRACT

This thesis presents the design of a Delay Locked Loop (DLL) based Single Slope ADC for the column readout of CMOS image sensors.

For the column readout of CMOS image sensors, several architectures can be used, and among them Single Slope ADC (SSADC) is the most popular one. However, the readout speed of SSADC is limited, and several architectures are proposed to increase the speed. In this work a DLL based Single Slope ADC is proposed which can increase the readout speed by 16 times. For the Correlated Double Sampling (CDS), the architecture with two comparators and XOR gate is implemented which also aims to increase the readout speed.

This ADC is implemented in TSMC 0.18µm 1P6M CMOS technology. A DLL is designed with a start-controlled Phase Frequency Detector (PFD), a differential ended Charge Pump (CP) and fully differential Delay Cells (DC). The multi-stage comparator with autozero technique to minimize the offset is also designed which can guarantee low Fixed Patten Noise (FPN). Some digital circuit designs such as the ripple counter and the cyclic thermometer code to binary code encoder are also included.

This ADC can achieve 12-bit resolution with  $3\mu$ s readout time. The total power consumption for 330 columns is 82mW, with the FoM=0.182 for the column level ADC

Keywords: Single Slope ADC, DLL, Auto zero, Comparator, FPN, CDS

## Contents

| Chapter 1 | 1 Introduction                              | 1  |
|-----------|---------------------------------------------|----|
| 1.1       | Introduction of CMOS image sensor           | 1  |
| 1.1.1     | 1 Photon sensitive element: Photodiode      | 2  |
| 1.1.2     | 2 CMOS pixel architectures                  | 3  |
| 1.1.3     | 3 Noise in CMOS image sensors               | 6  |
| 1.2       | Motivation                                  | 9  |
| 1.3       | Organization                                | 10 |
| 1.4       | References                                  | 10 |
| Chapter 2 | 2 Background                                | 13 |
| 2.1       | Readout structures of CMOS image sensor     | 13 |
| 2.1.1     | 1 Chip level ADC                            | 13 |
| 2.1.2     | 2 Column level ADC                          | 14 |
| 2.1.3     | 3 Pixel level ADC                           | 15 |
| 2.2       | Architectures for column level ADC          | 16 |
| 2.2.1     | 1 Ramp ADC                                  | 17 |
| 2.2.2     | 2 SAR ADC                                   | 20 |
| 2.2.3     | 3 Cyclic ADC                                | 21 |
| 2.2.4     | 4 Sigma Delta ADC                           | 22 |
| 2.3       | DLL based Single Slope ADC                  | 23 |
| 2.4       | References                                  | 26 |
| Chapter 3 | 3 Delay Locked Loop Design                  | 27 |
| 3.1       | Phase Detector                              | 28 |
| 3.1.1     | 1 False locking or harmonic locking problem | 28 |
| 3.1.2     | 2 Start-controlled PFD design               | 29 |
| 3.2       | Charge pump                                 | 32 |
| 3.2.1     | 1 Non ideality in Charge Pump               | 32 |
| 3.2.2     | 2 Architectures of Charge Pump              | 33 |
| 3.2.3     | 3 Fully differential Charge Pump design     | 36 |
| 3.3       | Delay Cell                                  | 38 |
| 3.3.1     | 1 Architectures of Delay Cell               | 38 |
| 3.3.2     | 2 Delay Cell design                         | 39 |

| 3   | 8.4    | Sim     | ulation Results of DLL4                     | 1  |
|-----|--------|---------|---------------------------------------------|----|
| 3   | .5     | Refe    | erences4                                    | 4  |
| Cha | pter 4 | 4 C     | olumn Circuit Design4                       | 7  |
| 4   | .1     | Con     | nparator design4                            | 7  |
|     | 4.1.   | 1       | Design considerations4                      | .7 |
|     | 4.1.2  | 2       | Auto zero technique to suppress the offset4 | 8  |
|     | 4.1.3  | 3       | Comparator design5                          | 4  |
|     | 4.1.4  | 4       | Amplifier design5                           | 6  |
|     | 4.1.   | 5       | Performance of the comparator6              | 1  |
| 4   | .2     | Digi    | ital circuit design6                        | 5  |
|     | 4.1.   | 1       | Counter design6                             | 5  |
|     | 4.1.2  | 2       | Encoder design6                             | 6  |
| 4   | .3     | Con     | clusion6                                    | 7  |
| 4   | .4     | Refe    | erences6                                    | 7  |
| Cha | pter   | 5 T     | op View6                                    | 9  |
| 5   | 5.1    | Cor     | related Double Sampling6                    | 9  |
|     | 5.1.   | 1       | Analog CDS6                                 | 9  |
|     | 5.1.2  | 2       | Digital CDS7                                | '1 |
|     | 5.1.3  | 3       | CDS with XOR gate7                          | 2  |
|     | 5.1.4  | 4       | CDS design7                                 | 3  |
| 5   | .2     | Nor     | n-linearity due to the clock skew7          | 5  |
|     | 5.2.3  | 1       | Misalignment problem7                       | 5  |
|     | 5.2.2  | 2       | Misalignment problem in this design7        | 6  |
| 5   | 5.3    | Ove     | rall performance7                           | 8  |
| 5   | .4     | Refe    | erences7                                    | 9  |
| Cha | pter   | 6 C     | onclusion and Future Work8                  | 1  |
| 6   | 5.1    | Con     | clusion8                                    | 1  |
| 6   | i.2    | Futi    | ure Work8                                   | 2  |
| Арр | bendix | <b></b> | 8                                           | 3  |

## List of Figures

| Figure 1-1 Photodiode structure and output voltage in the integrating mode [1.2] | 2  |
|----------------------------------------------------------------------------------|----|
| Figure 1-2 Passive pixel structure [1.3]                                         | 3  |
| Figure 1-3 3T APS structure [1.3]                                                | 4  |
| Figure 1-4 Timing of 3T APS                                                      | 5  |
| Figure 1-5 Pinned photodiode 4T APS structure [1.3]                              | 6  |
| Figure 1-6 Timing of 4T APS                                                      | 6  |
| Figure 1-7 Effect of FPN on image quality [1.4]                                  | 9  |
| Figure 2-1 Chip level ADC [2.2]                                                  | 14 |
| Figure 2-2 Column level ADC [2.2]                                                | 15 |
| Figure 2-3 Pixel level ADC [2.2]                                                 | 16 |
| Figure 2-4 Single Slope ADC and the timing diagram [2.1]                         | 18 |
| Figure 2-5 Multi Ramp Single Slope ADC and the timing diagram [2.5]              | 19 |
| Figure 2-6 Two-step Single Slope ADC and the timing diagram [2.6]                | 20 |
| Figure 2-7 SAR ADC architecture [2.7]                                            | 21 |
| Figure 2-8 Cyclic ADC architecture [2.7]                                         | 22 |
| Figure 2-9 Sigma Delta ADC [2.10]                                                | 23 |
| Figure 2-10 DLL based Single Slope ADC                                           | 24 |
| Figure 2-11 Timing of the DLL based Single Slope ADC                             | 25 |
| Figure 3-1 DLL architecture                                                      | 27 |
| Figure 3-2 DLL in (a) Correct locking (b) Harmonic locking                       | 28 |
| Figure 3-3 Phase Frequency Detector                                              | 29 |
| (a)Conventional tri-state PFD (b) Start-controlled PFD                           | 29 |
| Figure 3-4 PFD at 100MHz when the reference clock leads for 2ns                  | 30 |
| Figure 3-5 Resettable dynamic DFF                                                | 31 |
| Figure 3-6 PFD outputs for in phase at 100MHz square wave input                  | 31 |
| Figure 3-7 Single ended charge pumps                                             | 34 |
| (a) With drain switching (b) With gate switching (c) With source switching       | 34 |
| Figure 3-8 Single ended charge pump architectures                                | 35 |
| (a) With current steering (b) With active output buffer (c) With NMOS switches   | 35 |
| Figure 3-9 Fully differential Charge Pump [3.4]                                  | 36 |
| Figure 3-10 Fully differential Charge Pump with CMFB                             | 37 |
| Figure 3-11 Delay cell composed of current starved inverter [3.8]                | 38 |
| Figure 3-12 Delay cell by Maneatis [3.6]                                         | 39 |
| Figure 3-13 Delay cell with fully differential current starved inverter          | 40 |
| (a) Block diagram (b) Schematic                                                  | 40 |
| Figure 3-14 Delay time versus control voltage                                    | 41 |
| Figure 3-15 Differential control signals                                         | 42 |
| Figure 3-16 The jitter performance of the DLL in this work                       | 43 |

| (a) Under clean supply (b) Under noisy supply                                       | 43 |
|-------------------------------------------------------------------------------------|----|
| Figure 4-1 Comparator: Multi-stage comparator                                       | 48 |
| Figure 4-2 Auto zero: Input offset storage                                          | 49 |
| Figure 4-3 Auto zero: Input offset storage                                          | 50 |
| (a) Offset storage phase (b) Comparison phase                                       | 50 |
| Figure 4-4 Auto Zero: Output offset storage                                         | 52 |
| Figure 4-5 Auto zero: Output offset storage                                         | 52 |
| (a) Offset storage phase (b) Comparison phase                                       | 52 |
| Figure 4-6 Architecture of the comparator in this design with the timing diagram    | 54 |
| Figure 4-7 Auto zero: Combined input and output offset storage                      | 55 |
| (a)Offset storage phase (b) Comparison phase.                                       | 55 |
| Figure 4-8 Schematic of the pre-amplifier                                           | 57 |
| Figure 4-9 Offset of the pre-amplifier (a) 1st stage (b) 2nd stage                  | 58 |
| Figure 4-10 Schematic of the differential to single ended amplifier                 | 59 |
| Figure 4-11 The offset of the comparator with auto zero                             | 62 |
| Figure 4-12 Delay time of the comparator with auto zero                             | 62 |
| Figure 4-13 Delay time versus common mode input voltage                             | 63 |
| Figure 4-14 Ripple counter [4.7]                                                    | 65 |
| Figure 4-15 DLL outputs with the corresponding binary code                          | 66 |
| Figure 5-1 Analog Correlated Double Sampling                                        | 70 |
| (a) Separate CDS (b) Combined CDS with Comparator                                   | 70 |
| Figure 5-2 Digital CDS (a) Global counter & column latch (b) Column up/down counter | 72 |
| Figure 5-3 CDS with XOR gate [5.2]                                                  | 73 |
| Figure 5-4 Overall architecture with CDS implemented                                | 74 |
| Figure 5-5 Timing diagram of the ADC with CDS implemented                           | 75 |
| Figure 5-6 Misalignment of 2-step ADC (a) Ideal case (b) Real case                  | 76 |
| Figure A-1 Layout of the DLL                                                        | 83 |
| Figure A-2 Layout of the column circuits                                            | 83 |

## **List of Tables**

| Table 3-1 DLL performance                                                       | 42 |
|---------------------------------------------------------------------------------|----|
| Table 4-1 Simulation results of the pre-amplifiers                              | 59 |
| Table 4-2 Noise contribution of the transistors in the pre-amplifier            | 61 |
| Table 4-3 Corner simulations of the delay time of the comparator with auto zero | 63 |
| Table 4-4 Delay time at bright (Common mode input=0.7V)                         | 64 |
| Table 4-5 Delay time at dark (Common mode input =1.5V)                          | 64 |
| Table 4-6 Performance of the comparator                                         | 64 |
| Table 5-1 Comparison with recently published column ADC                         | 79 |

## **Chapter 1** Introduction

Over the past decade, the CMOS Image Sensor (CIS) market experienced a rapid growth. This is thanked to the broad application areas of the image sensor, such as the security, aerospace, entertainment, and especially the mobile phone applications.

Nowadays there are mainly two types of semiconductor-based image sensors on the market: Charge Coupled Devices (CCDs) and Complementary Metal Oxide Semiconductor (CMOS) image sensors (CIS).

The CCD technology was invented in 1969 by Boyle and Smith, and they were awarded the Nobel Prize in 2009 for this contribution. CCD was the dominating image sensor technology in the last century, because of its simplicity and the high performance over CIS by that time. However, in the past decade, thanks to the development of the lithography technology, CMOS image sensors gradually take over the market from CCDs. It is also predicted that CMOS image sensors are going to completely take over the market from CCD imagers in the next 10 years [1.1].

Compared with CCDs, CMOS image sensors have the advantage of easily integrating analog and digital signal processing blocks on the same chip. This is helpful to make the sensor more compact and more functional, and of course the cost and the power consumption go down further.

## 1.1 Introduction of CMOS image sensor

In this section, some fundamental knowledge of CMOS image sensors are given, including the working principle of the photodiode, the architecture of the passive and

the active pixels, and the noise sources in CMOS image sensors are also discussed in this part of the thesis.

### 1.1.1 Photon sensitive element: Photodiode

The photodiode is a fundamental part of the image sensor, which performs the function to convert the incoming light intensity information into the electrical signal based on the photo-electric effect. Figure 1-1 illustrates the photodiode structure.



#### Figure 1-1 Photodiode structure and output voltage in the integrating mode [1.2]

When the photodiode is exposed to the light, the incident photon will generate electron-hole pairs if the photon energy is larger than the band-gap of silicon. An electric field is generated in the depletion region of the pn junction in the photodiode, and this will separate the electrons and holes to prevent their recombination. The electrons drift towards the n-doped region and the holes drift towards the p-doped region. These drifted electrons and holes generate a reverse current called the photocurrent.

The magnitude of this photocurrent is very small which makes it difficult to be measured directly, so most modern image sensors get the light intensity information by integrating the photocurrent on a capacitor [1.3]. The junction capacitance of the photodiode is used as this capacitor and the voltage across it is read out. The output voltage of the photodiode in the integrating mode is also shown in figure 1-1. It can be seen the output voltage is proportional to the light intensity, and with the longer exposure time the

voltage across the capacitor decreases almost linearly. The light intensity information can be obtained by subtracting the voltage at the beginning and the end of exposure.

## 1.1.2 CMOS pixel architectures

The pixel of CMOS image sensor incorporates a photodiode with some necessary circuits to output the incoming light intensity information. The architectures of pixel can be classified as the passive pixel sensors (PPS) and the active pixel sensors (APS). In this section, a brief introduction of PPS, 3T APS and 4T APS is given.

1.1.2.1 Passive pixel sensors (PPS)

The PPS is the first generation of CMOS image sensor architecture. It only consists of one photodiode, one row select transistor and two interconnect lines, as illustrated in figure 1-2. When the row selection (RS) transistor is addressed, the voltage across the photodiode is read out by the column bus. The fill factor (The ratio of the light sensitive area to the total pixel area) is large in this architecture because of its simplicity. However the pixel suffers from a high noise level due to the mismatch between the small pixel capacitance and the large capacitance of the column bus [1.3].



Figure 1-2 Passive pixel structure [1.3]

#### 1.1.2.2 Photodiode Three Transistor (3T) Pixel

In the late 1960s, 3T APS was proposed to improve the performance of the image sensor. It contains one photodiode, 3 transistors and 4 interconnect lines, as illustrated in figure 1-3. Compared to the PPS, an in-pixel amplifier (source follower) is added in 3T APS for the readout purpose.



Figure 1-3 3T APS structure [1.3]

The timing diagram of 3T APS is shown in figure 1-4, and the working principle can be explained as follows: During the period of exposure, the photon generated electrons start to be collected by the photodiode in the integrating mode, and the voltage across the photodiode decreases. At the end of the exposure, the row select signal (RS) addresses a particular row of pixels, and the voltage across the photodiode will be sampled. This sampled voltage in frame (j) contains the video signal v<sub>j</sub>, the offset voltage o, and the reset noise (KTC noise)  $n_{r,j}$ . Then the photodiode is reset (RST) to the supply voltage (Vdd) for the next exposure, and after the reset action the voltage of this node is sampled again. This sampled voltage in frame (j+1) contains the offset voltage o and the reset noise  $n_{r,j+1}$ .

Even though the fill factor of 3T APS is lower compared to PPS because of the area occupied by the extra transistors and the interconnect lines, the noise performance has been improved significantly. This is mainly because the in-pixel amplifier isolates the photodiode capacitor from the large column bus capacitor.



Figure 1-4 Timing of 3T APS

Introducing an in-pixel amplifier will generate a problem by itself due to the mismatch of the amplifier between pixels, and this will cause the Fixed Pattern Noise (FPN). To overcome this problem, an effective way is to use Double Delta Sampling (DDS). As just explained, the signal of across the photodiode is sampled twice in one frame, and the one contains the video signal from frame (j) is substrated by the reset signal from frame (j+1) to obtain the light intensity. The reason why these two signals are sampled in two subsequent frames is because the double sampling operation needs to be operated in a short time period, but the time interval between these two samples in one frame is the total integration time, which is quite long in low frame rate applications [1.3]. Thus the two sampled signal can't be obtained in the same frame and the reset noise will be increased instead of cancellation.

#### 1.1.2.3 Pinned Photodiode Four Transistor (4T) Pixel

No doubt nowadays 4T APS is the mainstream CMOS image sensor pixel architecture, as shown in figure 1-5. Compared with 3T APS, it employs a pinned photodiode, which adds a transfer gate (TX) and a floating diffusion (FD) node to the basic 3T APS pixel.

The timing diagram of the 4T APS is shown in figure 1-6: First, a particular row of pixel is addressed, and the corresponding row selection signal (RS) is on. In this period the voltage across the floating diffusion node is available. Then the reset signal (RST) comes, which will reset the voltage of the floating diffusion node to the supply voltage. After this the voltage is read out immediately, which contains the offset voltage o, and the reset noise (KTC noise) n<sub>r,j</sub> in frame (j). When TX is on, the charge generated by the incoming photons will transfer from the photodiode to the floating diffusion node, and

then this signal is read out as the video signal. This sampled signal is also obtained in frame (j), and it contains the video signal  $v_j$ , the offset voltage o, and the reset noise (KTC noise)  $n_{r,j}$ . After this, the exposure of the next frame starts.



Figure 1-5 Pinned photodiode 4T APS structure [1.3]

The advantage of the 4T pinned photodiode APS over 3T APS is that the reset signal and the video signal can be read out in the same frame, and in this case the reset noise is correlated between the two subsequent samples which makes it possible to suppress the reset noise by subtraction. These two signals are correlated, so this double sampling is called Correlated Double Sampling (CDS).



Figure 1-6 Timing of 4T APS

## 1.1.3 Noise in CMOS image sensors

The noise sources in a CMOS image sensor can be classified into two categories: temporal noise sources and Fixed Pattern Noise (FPN) sources. Temporal noise is

independent across pixels and varies from frame to frame. Sources of temporal noise include photo shot noise, pixel reset noise, readout circuit thermal noise and flicker noise, etc. Fixed Pattern Noise (FPN) is a spatial noise observed by people, and it does not change from frame to frame.

#### 1.3.1.1 Photon Shot Noise

Photon Shot Noise is a kind of noise that relates to fundamental physical laws, rather than circuit design and technology. It describes a statistical phenomenon that the amount of photoelectrons generated and captured by the sensor follows the Poisson distribution. The magnitude of the photon shot noise is equal to the square root of the mean number of electrons generated in the photodiode [1.3].

$$\overline{N_{shot}} = \sqrt{N_{signal}}$$
(1-1)

 $N_{\rm shot}$  is the shot noise in the unit of electrons, and  $N_{\rm signal}$  is number of photo-generated electrons.

#### 1.3.1.2 Reset Noise

For the APS pixel, the voltage across the junction capacitor needs to be reset every conversion cycle. However this will introduce the reset noise (kTC noise) to the output. The reset noise can be expressed as:

$$V_{\text{reset}} = \sqrt{\frac{kT}{C}}$$
(1-2)

k is the Boltzmann constant. T is the absolute temperature. C is the junction capacitance of the photodiode. The magnitude of the reset noise can be large compared to other temporal noise sources such as thermal noise, 1/f noise, etc [1.3]. Fortunately the reset noise will be cancelled by the correlated double sampling to a negligible level.

#### 1.3.1.3 Thermal noise

Thermal noise was discovered in 1928, and it is first measured by Johnson, that is the reason why it is also called Johnson noise. Thermal noise is the electronic noise generated by the random thermal motion of the charge carriers in an electrical conductor. In the front end of CMOS image sensor this noise is generated by the source follower transistor.

$$\overline{V_{\rm T}}^2 = 4\,\rm kTR \tag{1-3}$$

 $\overline{V_T}^2$  is the power spectral density of the thermal noise, and R is the resistance. k and T are the same as explained in the reset noise section. Because thermal noise has a flat power spectral density over the frequency, the equivalent noise amplitude is proportional to the bandwidth. To minimize the thermal noise, the sampling capacitor can be designed large to decrease the noise bandwidth.

#### 1.3.1.4 Flicker noise

Flicker noise is another dominating noise source in transistors. Unlike the thermal noise, the power spectral density of the flicker noise is inversely proportional to the frequency, and this is the reason why it is also expressed as 1/f noise.

$$\overline{V_F}^2 = \frac{K_{1/f}}{C_{\text{ox}}WL} \cdot \frac{1}{f}$$
(1-4)

 $\overline{V_F}^{-2}$  is the power spectral density of the flicker noise.  $K_{1/f}$  is the flicker noise coefficient.  $C_{ox}$  is the gate capacitance per unit area. W and L are width and length of the transistor, and f is the frequency. Flicker noise is generated because of the lattice defects at the interface of silicon channel of CMOS transistor and gate oxide will trap the charge carriers [1.2]. To play with the process technology is a good option to minimize the flicker noise. As shown in the equation above, for the circuit design, to increase the transistor area is the most frequently used method to decrease this 1/f noise level.

#### 1.3.1.5 Fixed Pattern Noise

Fixed pattern noise refers to the static non-uniformities from pixel to pixel or column to column in the sensor array, and this generated "fixed pattern" does not change from frame to frame [1.2].

For the CMOS image sensor with a column level ADC, there are mainly two sources that will generate two kinds of FPN: one is the offset of the source follower in the pixel, and this would contribute to the pixel FPN. The other source is the column level ADC offset, which would cause the column FPN.

Figure 1-7 illustrates the image with negligible FPN, with 2.5% pixel FPN and with 2.5% column FPN. It can be seen that the human eye is highly sensitive to the column FPN, which makes it critical to guarantee a low column FPN in CMOS image sensor design. Column FPN is defined as the standard deviation on the signal of the column circuits. Normally the FPN should be designed with the specification of lower than 0.1%, which is hardly visible to human eye.



Figure 1-7 Effect of FPN on image quality [1.4]

## **1.2 Motivation**

The technology of CMOS image sensors experienced a huge progress in the past decade. The pixel pitch is down to 1.12µm thanks to the Back Side Illumination (BSI) technique, and the resolution of an image sensor can easily achieve tens of Mega-Pixels. The Xbox Kinect which is equipped with a 3D CMOS image sensor also achieved a great success in the past year and it seems to continue this success in the future. With the resolution of CMOS image sensors increasing, the demand of the readout speed also increases to achieve the same frame rate. In addition, many applications such as the 3D Time-of-Flight (ToF) image sensors require very high readout speed of the CMOS image sensor. However the conventional Single Slope ADC architecture for the column readout of CMOS image sensors suffers from a long readout time which makes it not suitable for the high speed application.

In this work, the DLL based Single Slope ADC is designed for the column readout of CMOS image sensor. This ADC is designed for a new type of pixel architecture which needs a pretty short analog to digital conversion time. Besides, this ADC architecture can also be used for the conventional 4T pinned photodiode APS architecture. This ADC is designed with 12 bits resolution and 3µs readout time, and it can be fit into the 5.4µm pixel pitch.

## **1.3 Organization**

In the following chapters, some background knowledge together with the design and analysis of the column level ADC is given.

Chapter 2 briefly describes the readout method of CMOS image sensor, and several published column ADC architectures are also discussed. Chapter 3 covers the design of the Delay Locked Loop (DLL). In chapter 4, the other building blocks in the column level are described which consists of: comparator, decoder and counter. A top level design is shown in chapter 5, which describes the Correlated Double Sampling (CDS) and the misalignment problem. ADC performance and the comparison with recently published papers are also included in this chapter. Chapter 6 gives the conclusion and talks about the future work.

## **1.4 References**

[1.1] Theuwissen, A., CMOS image sensors: State-of-the-art and future perspectives. ESSCIRC 2007. Page: 21 – 27.

[1.2] Snoeij, M., Analog Signal Processing for CMOS Image Sensors. PhD thesis, Delft University of Technology, 2007. Page: 17 – 53.

[1.3] Ma, C., Pixel ADC design for hybrid CMOS image sensor. Master thesis, Delft University of Technology, 2010. Page: 2 - 8.

[1.4] Dupont, B., Toward lower uncooled IR-FPA system integration cost. Proceeding, SPIE 2007. 65421S

## **Chapter 2** Background

Nowadays most modern sensors incorporate an Analog to Digital Converter (ADC) as the sensor interface for the back-end signal processing because of the reliability and robustness to perform signal processing and transportation in the digital domain. The CMOS imager sensor has two main properties that differentiate it from other sensors. Firstly, it consists of a large array of light-sensitive pixels, which allows for a parallelized analog to digital conversion. Secondly, due to this large sensor array, the total data rate is much higher than most other sensors [2.1].

## 2.1 Readout structures of CMOS image sensor

The performance of the ADC in CMOS image sensor has an important impact on the overall performance, so it is critical to choose a proper structure for the readout. The ADC type incorporated in image sensors can be classified into three categories: chip level, column level and pixel level.

## **2.1.1 Chip level ADC**

Figure 2-1 illustrates the block diagram of the CMOS image sensor with a chip level ADC. The analog outputs from the pixel array are readout row by row via the column circuits, which will be send to the chip level CDS and ADC to get the digital data. Many early CMOS image sensors are equipped with this approach because of its simplicity. Another advantage of the chip level ADC is the uniformity, which is because the CDS amplifier and ADC are implemented on chip level and they are shared for all the pixels.



Figure 2-1 Chip level ADC [2.2]

However there are also two drawbacks. One is the readout speed, because the chip level CDS and ADC have to process a large number of pixels and this leads to a low frame rate. To increase the readout speed, a large bandwidth is needed which means huge power consumption. Meanwhile there will be a limitation of the bandwidth which would also be the limitation of the readout speed of the image sensor, so it is especially difficult to get a high frame rate for large resolution CMOS image sensor with a chip level ADC. The other drawback is that this approach has a longer analog signal chain compared to the column-level or pixel-level ADC. Since the gain in each of the analog circuits is typically limited to one, each sub-circuit will significantly contribute to the overall noise of the analog signal path. A shorter signal path would have a better noise performance [2.1].

### 2.1.2 Column level ADC

To meet the increasing demand on the readout speed, the ADC moves from chip level to column level. Figure 2-2 illustrates the block diagram of a CMOS image sensor with column level ADC. Compared to the chip level ADC, one row of pixels can be digitized concurrently because the CDS and ADC are performed in the column level. This will release the requirement on the bandwidth of ADC, thus high speed operation can be

achieved. This approach is widely used because it provides a good compromise between the fill factor (See discussion on pixel level ADC), power consumption and speed.



Figure 2-2 Column level ADC [2.2]

However there are also two potential drawbacks of this approach. One is the column FPN, this is because it is difficult to guarantee the column to column uniformity due to offset and gain mismatch. Another drawback is the area increase due to the layout of large number of column level ADCs.

## 2.1.3 Pixel level ADC

To go further, the ADCs can be shifted down to pixel level, which means to implement an ADC in each pixel. This is illustrated in Figure 2-3. Compared to the chip level and column level ADC, implementing the pixel level ADC is the most radical way to parallelize and shorten the analog signal processing chain [2.1]. One of the advantages of this approach is the very high speed readout that can be achieved via this fully parallel analog to digital conversion, another advantage is the lower FPN compared to the column level ADC. On the other hand, the drawback is also obvious: All the circuits need to be implemented inside the pixel, and this will lead to either large pixel size or low fill factor. The frame rate of the column level ADC can be achieved as high as 2000 frame/s [2.3], and for the pixel level ADC a frame rate of 10000 frame/s can easily be achieved [2.4].



Figure 2-3 Pixel level ADC [2.2]

## 2.2 Architectures for column level ADC

As mentioned in the previous section, column level ADC is the most widely used approach, when it comes down to high-speed applications, because it provides a good compromise between fill factor, power consumption and speed. Here in this section a brief introduction to the various column parallel ADC architectures is given.

To design the column level ADC, many specifications need to be considered: resolution, non-linearity, speed, PFN, area and power.

Resolution: The Dynamic Range (DR) of a sensor is determined by the maximum voltage swing and the noise of the front-end circuit, it can be 60-70dB for CMOS image sensors. Therefore, the ADC is typically designed with the resolution around 10-12bits [2-1].

Non-linearity: Because the non-linearity of the sensor itself and the front-end circuit is large (typically around 1%), the integral non-linearity (INL) performance of the ADC is

not so critical. On the other hand, because of human eyes' high sensitivity to spikes in the image, the differential non-linearity (DNL) is very important.

Speed: Due to the working principle of different ADC architectures, the readout time may vary from several clock cycles to thousands of clock cycles. However, for some architectures the increased readout speed is at the expense of the other specifications, and it is important to choose a proper ADC architecture.

FPN: As mentioned in the previous chapter, the column FPN is very critical for the image sensor because of the sensitivity of human eyes. The FPN should be guaranteed to be under 0.1%.

Area: Considering the column level ADCs are placed in the column of the image array, the area and especially the width should be well controlled. The requirement of the width can be released to twice of the pixel pitch if the column ADCs are placed at both the top and bottom of the pixel array.

Power: The power consumption is an important specification for ADCs, and this is also true for the column level ADC.

There are mainly three architectures used in column level ADC: Single Slope ADC, Cyclic ADC and Successive Approximation (SAR) ADC. The Sigma Delta ADC used as column level ADC for the high-speed readout was first published in 2010, which also shows very high performance. All these architectures will be discussed in this section.

#### **2.2.1 Ramp ADC**

#### 2.2.1.1 Single Slope ADC

The architecture of a single slope ADC is rather simple, which is only comprised of a ramp generator, a digital counter, comparators, and memory. The block diagram with the timing diagram is illustrated in figure 2-4.

This is the conventional architecture, a global counter is implemented and the generated digital codes are distributed over the columns. When the ramp voltage exceeds the input voltage, the memory will latch the digital code from the counter.



Figure 2-4 Single Slope ADC and the timing diagram [2.1]

Column to column variation will generate the FPN due to the clock skew, and the FPN will become obvious if a high frequency clock is applied. By implementing the ripple counter within each column, the FPN due to clock skew will be cancelled by the CDS automatically.

The dominating drawback of this architecture is the limited readout speed, which needs  $2^{N}$  clock cycles and makes it not suitable for high speed application. There are several modified Single Slope ADC architectures to increase the readout speed, such as the Multi Ramp Single Slope ADC (MRSS) and two-step Single Slope ADC.

2.2.1.2 Multi Ramp Single Slope ADC

The block diagram and the corresponding timing diagram are illustrated in figure 2-5. It can be seen in principle this is a kind of two-step ADC, because it divides the A to D conversion process into p-bit coarse conversion and q-bit fine conversion. In the coarse conversion phase, the input signal is compared with the large step coarse ramp, which decides the coarse bits of the signal and chooses the sub-ramp for the fine conversion. In the fine conversion phase, several ramp signals with the same slope but different offset are applied concurrently, and the input signal from the pixel is compared with the

corresponding ramp signal to get the fine bits. The readout time of this architecture is reduced to  $2^p + 2^q$  clock cycles.

There are also several issues with this architecture. The first one is the misalignment between the fine conversion and the coarse conversion. Another issue would be the mismatch between sub-ramps. The number of coarse conversion bits is also limited due to the exponentially increased circuit complexity and power consumption.



Figure 2-5 Multi Ramp Single Slope ADC and the timing diagram [2.5]

#### 2.2.1.3 Two-step Single Slope ADC

Using 2<sup>p</sup> sub ramps would introduce the mismatch problem and the corresponding increased power consumption. A two-step single slope ADC was proposed in [2.6] to solve these problems. Figure 2-6 shows the schematic and the timing diagram. The principle is to place a hold capacitor in the ADC to store the coarse voltage level in the coarse conversion phase, which would be added to the ramp signal in the fine conversion phase to obtain the fine bits.



Figure 2-6 Two-step Single Slope ADC and the timing diagram [2.6]

This improved two-step single slope architecture makes it possible to choose an unlimited number of coarse bits, and this would increase the readout speed further. Considering the same comparator is used for the coarse and fine conversion, the offset wouldn't cause the dead band. However it is difficult to make sure that voltage across the hold capacitor is constant during the conversion phase, and this variation is due to the voltage attenuation between the hold capacitor and the input capacitor of the comparator. To minimize this effect, the hold capacitor has to be designed large which will increase the area. With this architecture an extra bit is also need to solve the overrange problem.

#### 2.2.2 SAR ADC

Single Slope ADC is the most widely used column level ADC architecture, but the readout time is 2<sup>N</sup> clock cycle, which would limit the frame rate of the CMOS image sensor. With the SAR ADC implemented, the readout time can be only N clock cycles that will increase the readout speed significantly. However this is at the expense of area and power.

Figure 2-7 illustrates the block diagram of the SAR ADC, the working principle can be explained as follows: After the input signal is stored in the sample and hold circuit, the conversion cycle starts. In the register the MSB is set to 1 and the remaining bits to 0. The DAC will generate a value representing half of the reference value. Now the comparator determines whether the held signal value is over or under the output of DAC to keep or reset the MSB. In the same fashion, the next N-1 bits in the output register are determined [2.7].



Figure 2-7 SAR ADC architecture [2.7]

The disadvantage of SAR ADC is the fact that a Digital to Analog Converter (DAC) needs to be implemented which would occupy a large area, and this makes it difficult to be fit into the narrow pixel pitch. Fortunately, because of its high speed, the SAR ADC can also be shared by two or more columns which would release the area limitation [2.2]. However in this way the readout speed also decreases proportionally and it is a tradeoff between area and readout speed.

Another disadvantage of SAR ADC is the mismatch of the capacitors in the DAC, to obtain a low FPN, smaller calibration capacitor banks can be used for compensation [2.8]. However this will increase the area and also the circuit complexity.

## 2.2.3 Cyclic ADC

In the previous section SAR ADC is explained, and its searching process is implemented by comparing the input value to a set of values from the DAC. Considering this DAC will limit the accuracy and occupy a large area, the Cyclic ADC can be used which will keep the reference voltage constant and avoid using a DAC. This ADC needs N clock cycles to finish one conversion, the same as SAR ADC. The block diagram is shown in figure 2-8, it can be seen that the signal is modified by capacitive manipulation. First, the input signal is sampled, and then it is multiplied by two to be compared with the reference signal to generate the MSB. Based on the MSB either zero or the reference voltage is subtracted from the signal. The remainder is fed back to the S&H circuit and treated as the new input value of the next bit [2.7].



Figure 2-8 Cyclic ADC architecture [2.7]

The main problem with a cyclic ADC is that the gain of the amplifier needs to be accurate to obtain a low FPN. This means accurate capacitor matching and amplifier setting, which results in increased power consumption. The 1.5-bit/cycle algorithm can be used to reduce the precision requirement of the comparator which will help to lower the power consumption, and the details are described in [2.9].

## 2.2.4 Sigma Delta ADC

In 2010 a Sigma Delta ADC for the high speed column readout of image sensor was proposed [2.10], before that it is only applied for low-speed imaging with large pixel pitch because of the complexity of the modulator and the following decimation filters.

Figure 2-9 illustrates the block diagram of the column readout circuit. The sigma delta modulator suppresses the noise by the over-sampling, and the decimation filter is used to obtain the digital bits. With the higher order of the modulator implemented, the fewer conversion cycles are needed.

A 2nd-order sigma delta modulator is implemented with inverter-based Switch Capacitor (SC) circuits. Thanks to this inverter-based SC circuit the Sigma Delta modulator can be implemented with low power consumption within a small area. A compact decimation filter is implemented, and digital correlated double sampling (CDS) is used to remove device variation and offset.

Compared to the other architectures, the performance of the image sensor with the sigma delta ADC is improved and the details can be found in [2.10].



Figure 2-9 Sigma Delta ADC [2.10]

## 2.3 DLL based Single Slope ADC

The Single Slope ADC is still the most popular ADC architecture for the column readout of CMOS image sensors due to its satisfying performance and the simplicity. The modified Single Slope ADCs mentioned above are able to improve the readout speed while maintaining their simplicity.

The Single Slope ADC can also be combined with a PLL to get high speed readout [2.11]. With this increased clock frequency by the PLL, the readout speed also increases proportionally. The additional problem is that the dynamic power consumed to drive this clock line is also increased.



Figure 2-10 DLL based Single Slope ADC

In this thesis a DLL based Single Slope ADC is presented which is able to increase the readout speed by 16 times. The block diagram and the timing diagram are illustrated in figure 2-10 and 2-11. Unlike the PLL which will multiply the clock frequency, a DLL generates 16 delay lines having the same frequency as the reference clock signal. However these 16-bit digital codes from the DLL must be processed by the encoder to get the fine 4-bit binary code.

This Analog to Digital conversion is divided into the coarse conversion and the fine conversion. The difference between this ADC and the two-step ADC is that in this ADC the coarse conversion and the fine conversion are processed concurrently instead of sequentially.
Within each column, one 8-bit ripple counter is implemented to obtain the 8 coarse bits. The delay lines from the DLL are distributed over the columns, and only 8 out of 16 delay lines are used because they are already sufficient to perform the encode function (See discussion in the encoder section of chapter 4).



Figure 2-11 Timing of the DLL based Single Slope ADC

The timing can be explained as: When the rising edge of the Start signal comes, the ramp signal starts decreasing and the 8-bit ripple counter starts counting. At the moment the ramp signal exceeds the input signal, the comparator output flips over. The comparator output disables the counter which stops counting and the 8 coarse bits are

stored. Meanwhile, the comparator output also triggers the 8-bit memory, which latches the digital data from the DLL and the outputs of the memory are directly fed to the encoder to obtain the 4 fine bits.

Compared to the conventional Single Slope ADC, only the memory and the encoder are added within each column, and the increased power consumption due to this is negligible.

# **2.4 References**

[2.1] Snoeij, M., Analog Signal Processing for CMOS Image Sensors. PhD thesis, Delft University of Technology, 2007. Page: 73 – 86.

[2.2] Ma, C., Pixel ADC design for hybrid CMOS image sensor. Master thesis, Delft University of Technology, 2010. Page: 11 – 18.

[2.3] Furuta, M., A High-Speed, High-Sensitivity Digital CMOS Image Sensor With a Global Shutter and 12-bit Column-Parallel Cyclic A/D Converters, IEEE Journal of Solid-State Circuits, 2007. Volume: 42, Issue: 4. Page: 766 – 774.

[2.4] Klenfelder, S., A 10000 frames/s CMOS digital pixel sensor. IEEE Journal of Solid-State Circuits, 2001. Volume: 36, Issue: 12. Page: 2049 – 2059.

[2.5] Snoeij, M., Multiple-Ramp Column-Parallel ADC Architectures for CMOS Image Sensors. IEEE Journal of Solid-State Circuits, 2007. Volume: 42, Issue: 12. Page: 2968 – 2977.

[2.6] Lim, S., A High-Speed CMOS Image Sensor with Column-Parallel Two-Step Single-Slope ADCs. IEEE Transactions on Electron Devices, 2009. Volume: 56, Issue: 3. Page: 393–398.

[2.7] Pelgrom, M., Analog to Digital Conversion. ET4369, Delft University of Technology. Page: 136 – 142.

[2.8] Krymski, A., et al., A High-Speed, 240-frames/s, 4. 1-Mpixel CMOS sensor. IEEE Transactions on Electron Devices, 2003. Volume: 50, Issue: 1. Page: 130 – 135.

[2.9] Furuta, M., et al., A High-Speed, High-Sensitivity Digital CMOS Image Sensor with a Global Shutter and 12-bit Column-Parallel Cyclic A/D Converters. IEEE journal of Solid-State Circuits, 2007. Volume: 42, Issue: 4. Page: 766 – 774.

[2.10] Chae, Y., A 2.1 M Pixels, 120 Frame/s CMOS Image Sensor With Column-Parallel sigma delta ADC Architecture. IEEE Journal of Solid-State Circuits, 2011. Volume: 46, Issue: 1. Page: 236 – 247.

[2.11] Yoshihara, S., A 1/1.8-inch 6.4M Pixel 60 frames/s CMOS Image Sensor With Seamless Mode Change. IEEE Journal of Solid-State Circuits, 2006. Volume: 41, Issue: 12. Page: 2998 – 3006.

# Chapter 3 Delay Locked Loop Design

A Delay Locked Loop (DLL) is composed of a Phase Detector (PD), a Charge Pump (CP), a Loop Filter, and a Voltage Controlled Delay Line (VCDL). The difference between a Delay Locked Loop (DLL) and a Phase Locked Loop (PLL) is replacing the Voltage Controlled Oscillator (VCO) with the Voltage Controlled Delay Line (VCDL).

The block diagram of DLL is illustrated in figure 3-1. Its operating principle can be explained as followes: The reference clock signal (Ref\_clk) propagates through the VCDL, and the output from the VCDL (Out\_clk) is send to the Phase Detector to make a comparison with the reference clock. The PD will generate two signals UP and DN to control the charge pump to charge or discharge the loop filter, and the voltage across the loop filter (Vctrl) will control the VCDL to adjust the delay time to guarantee Ref\_clk and Out\_clk are locked in phase.



Figure 3-1 DLL architecture

# 3.1 Phase Detector

The function of the Phase Detector is to generate two DC signals UP & DN which are proportional to the phase difference between two input signals to control the charge pump of PLL or DLL.

### 3.1.1 False locking or harmonic locking problem



Figure 3-2 DLL in (a) Correct locking (b) Harmonic locking

False locking or harmonic locking problem describes the situation that the Ref\_clk signal and the Out\_clk signal of the DLL are not locked into one clock cycle. The correct locking and the 2<sup>nd</sup> order harmonic locking of the DLL with 4 delay cells are illustrated in figure 3-2. False locking or harmonic locking is a common issue in DLL design which must be avoided because it will cause the malfunction. Considering the DLL using the PD which

has the capture range of  $\pm \pi$ , if the initial delay of the VCDL is shorter than  $1/2 T_{CLK}$  or longer than  $3/2 T_{CLK}$ , the DLL will try to lock the false phase. This provides a requirement of the initial delay  $T_{VCDL-INITIAL}$ :

$$\frac{1}{2}T_{\text{CLK}} \le T_{\text{VCDL-INITIAL}} \le \frac{3}{2}T_{\text{CLK}}$$
(3-1)

The conventional Phase Detector (PD) greatly limits the performance of DLL, so lots of solutions are proposed to overcome this problem. The basic idea is to use a Phase Frequency Detector (PFD) which has a capture range of  $\pm 2\pi$ , and it is a better choice for wide range operations. However, this PFD only increases the capture range but the harmonic locking problem still exists. There are several solutions to solve the false locking problem, and among them using a start-controlled PFD is an effective solution [3.1].



# 3.1.2 Start-controlled PFD design

Figure 3-3 Phase Frequency Detector

#### (a)Conventional tri-state PFD (b) Start-controlled PFD

The start-controlled PFD proposed in [3.1] is implemented in this design, which effectively solves the harmonic locking problem, and also it will not try to lock the zero delay. Figure 3-3(b) shows the block diagram. Compared with the conventional PFD

shown in figure 3-3(a), only one rising-edge triggered DFF and one logic gate are added. These additional circuits are used to generate the control signal Rdy which can solve the false locking problem.

Figure 3-4 shows the simulation result of the start-controlled PFD when the Ref\_clk is 2ns ahead of the Out\_clk. It can be explained as: Initially, the start signal is set at low which also sets the signal Rdy to low. Therefore, the VCDL delay time is initialized and set to its minimum value. When the start signal switches to high, the Rdy signal also switches to high after the rising edge of Ref\_clk. Due to the nature of the negative feedback architecture, the VCDL delay increases until it is equal to one clock cycle of the input signal. Since the start-controlled circuit forces the VCDL delay to its minimum value and causes the VCDL delay to increase until its delay equals one clock cycle, the DLL will not fall into false locking or harmonic locking problem [3.2]. Even if the false locking problem happened in some extreme conditions, the start signal provides the opportunity to restart the system to solve this issue.



Figure 3-4 PFD at 100MHz when the reference clock leads for 2ns

Conventional PFD uses a static DFF which suffers from a long delay time to reset all the internal nodes. This long delay time would cause the dead zone in which PDF does not

adjust for small changes of the input signals. Any width of the dead zone directly translates to jitter in the DLL and must be avoided [3.2]. Using a Dynamic DFF as shown in figure 3-5 allows a high frequency operation because of the simplicity of the DFF which reduces the number of gate delays.



Figure 3-5 Resettable dynamic DFF

Figure 3-6 illustrates the timing diagram when the Ref\_clk and the Out\_clk are locked in phase. The propagation delay from input to correct the UP or DN is only 128ps, and the total reset delay of PFD is 388ps. The power consumption is  $37\mu$ W.



Figure 3-6 PFD outputs for in phase at 100MHz square wave input

# 3.2 Charge pump

The charge pump is the circuit that would use switches to control the voltage across the loop filter depending on the input signals. The charge pump can be classified into two categories: Single ended charge pump and differential ended charge pump. Single ended charge pump would benefit from its simple architecture, lower power consumption and smaller area, while differential architecture is suited for high speed operation. Additionally, differential ended charge pump has the better immunity to the supply noise and substrate noise, and this is important for the mixed signal environment of a CMOS image sensor.

## 3.2.1 Non ideality in Charge Pump

Current and Timing Mismatch:

The current sources used to charge and discharge the loop filter are implemented by MOS transistors, which will generate the mismatch between these currents due to the process technology. In addition, the PMOS switch and NMOS switch would have certain switching speed difference. These kinds of mismatch would give rise to a change in the control voltage at each phase comparison. To minimize the phase error generated from the current mismatch, turn on time of current sources should be minimized.

#### Leakage Current:

Leakage current is another issue of the charge pump. The amount of leakage current can be as high as 1nA in sub micron CMOS [3.3]. The loop response for this DC leakage current is a difference between the UP and DN signals that would produce the same amount of current equal to leakage current over one period. In other words, charge pump outputs a certain phase offset to compensate for this leakage current [3.4]. This phase offset might be negligible for the cases that charge pump current is high and a large load capacitor is used.

### Charge Injection:

When the switches are on, finite amount of charge is held in their channel. When the switches turn off, these charges will flow partially through the drain and the source of the device. The amount of charge that is injected into the load capacitance would give rise to the variation of the control voltage [3.4]. However this issue is not that obvious with a large load capacitor.

#### Clock Feed-through:

This is due to the coupling between the control signals and the analog signal node. Such coupling happens because of the gate to source and gate to drain overlap capacitances, and these capacitors sizes are related to the transistor area as shown in equation 3-2.

$$C_{gd} = C_{gs} = \frac{C_{gg}}{2} = \frac{WL \times C_{OX}}{2}$$
(3-2)

 $C_{gd}$  and  $C_{gs}$  are the gate to source and gate to drain overlap capacitances, respectively.  $C_{gg}$  is the gate capacitance. W and L are the transistor width and length.  $C_{OX}$  is the gate capacitance per unit area.

#### Charge Sharing:

The charge sharing occurs when the switches turn from high to low, and the voltage dependent parasitic coupling capacitors are maximized. These capacitors  $C_{gd}$  and  $C_{gs}$  explained above share the gate charge and conduct it to both source and drain. The glitches will be caused at the output nodes in charge pumps and this effect may become obvious when the high frequency clock is used.

### 3.2.2 Architectures of Charge Pump

Single-ended charge pumps are widely used since they do not require complex configurations. Moreover with the well known tri-state operation, single ended architectures offer lower power consumption compared to differential architectures. Single ended architectures typically have three types of switching locations: drain, gate and source switching.



Figure 3-7 Single ended charge pumps

(a) With drain switching (b) With gate switching (c) With source switching

Drain switching:

As illustrated in figure 3-7(a), the switches of this architecture are located at the drain of the current mirrors. Because the switches are connected to the loop filter, the clock feed-through and charge injection effect will affect the loop filter directly. When the signal DN turns the switch on, the drain voltage of M1 varies from ground to the voltage across the loop filter. During this period high peak currents may be generated due to the voltage difference of two series turn-on resistors from the current mirror M1, and the switch [3.3]. The same situation will also occur on the PMOS side.

#### Gate switching:

When the switches are placed at the gates of the current mirror, the configuration is called the gate switching. Here these two transistors can be guaranteed to work in saturation region, but the switching time is dependent on the trans-conductance of M3 and M4. This will be a limit for high speed operation because the bias current of these two transistors can't be scaled down [3.4].

#### Source switching:

The source switching configuration is shown in figure 3-7(c), in which the two current mirror transistors M1 and M2 are both in saturation. However, the voltage across the on-resistance of the PMOS and NMOS switches will generate the current mismatch between the charging and discharging currents.

Several modified architectures are also published, and they are designed based on the above typical configurations to improve the performance.



Figure 3-8 Single ended charge pump architectures

#### (a) With current steering (b) With active output buffer (c) With NMOS switches

A single ended charge pump with current steering technique is illustrated in figure 3-8(a) [3.5]. With this architecture, faster switching can be achieved compared to the voltage switching, and this would suit for the high speed operation.

Figure 3-8(b) presents another topology of single ended charge pump. An active amplifier is added here to set the drain voltage of M1 when the switch is off, and this would reduce the charging sharing effect when the switch turns on. If the parasitic capacitance is comparable to the value of the capacitor in the loop filter, this architecture is preferred [3.4].

In figure 3-8(c), only NMOS switches are used in this architecture, which would eliminate the intrinsic mismatch between NMOS and PMOS. However, there is no current flow through the current mirror when the signal DN is off, and to turn on the current mirror will consume some time which is the problem of this architecture. This will be the limitation for the high speed operation [3.6].

## 3.2.3 Fully differential Charge Pump design

A fully differential charge-pump with current steering technique [3.4] is used here, and it is illustrated in figure 3-9. This architecture has the advantage of the current steering charge pump, which can steer the current between branches without turning off the current mirror.



Figure 3-9 Fully differential Charge Pump [3.4]

The charging and discharging of the differential load capacitors depends on the timing of the two control signals UP and DN. For example, when both UP and DN signal are on, the current provided by the PMOS current mirror directly flows to the ground, without charging or discharging the load capacitors. With this architecture, current sources are always on and in saturation, thus charge sharing effects due to the switching are minimized. However the common mode feedback is needed to fix the output common mode voltage.

Common Mode Feedback: The common mode output voltage is not defined in the fully differential architecture. For example, if the charging and discharging currents are not balanced, the common mode voltage may increase or decrease, so it is necessary to stabilize this voltage. The schematic of this fully differential charge pump with common mode feedback (CMFB) circuit [3.7] is shown in figure 3-10.



Figure 3-10 Fully differential Charge Pump with CMFB

The CMFB circuit only responds to the variation of the common mode output voltage. If the common mode output signal from the charge pump increases, the current of M1 and M4 will also increase, meanwhile the current through M2 and M3 decreases. Considering transistor M7 and M8 are connected in the current mirror mode, the current flows through transistor M8 is copied to the transistor M7. Then the charging current of the output node is larger than the discharging current and the output node voltage Vfb is increased, which will be fed back to the charge pump to decrease the current which will lower the common mode voltage.

# 3.3 Delay Cell

### 3.3.1 Architectures of Delay Cell

Figure 3-11 illustrates a typical current starved inverter [3.8], which is a widely used delay cell architecture in DLL design. With this architecture the charging and discharging current of the inverter are governed by transistor M1 and M2. This provided current is smaller than the original charging and discharging current of the inverter itself, and that is the reason why it is called the current starved inverter. Considering this is not a differential architecture, it will be sensitive to the supply and substrate noise.



Figure 3-11 Delay cell composed of current starved inverter [3.8]

Figure 3-12 shows the delay cell by John G. Maneatis [3.6]. With this architecture the delay time is approximately linear with the control voltage, which makes it easier to get a better jitter performance and also the sensitivity to the supply noise is decreased. However this architecture cannot guarantee the rail to rail output, in addition, the extra output stage is needed to obtain a duty cycle of 50%.



Figure 3-12 Delay cell by Maneatis [3.6]

### 3.3.2 Delay Cell design

Normally a PLL and a DLL are designed with a fully differential charge pump, but most of the delay cells only have one control terminal. Thus the differential ended output from the charge pump needs to be converted to a single ended output, and it will lose the benefit of the differential architecture. In this work, a delay cell with two control terminals is designed.

Figure 3-13 shows the architecture of the delay cell used in this design, which is almost the same as the one proposed in [3.9] except the two output inverters. It is a fully differential current starved delay cell with latch, and the two output inverters are for the consideration of shaping the output curve to minimize the jitter.

Compared to the conventional current starved inverter, it is implemented as a fully differential one without static power. In addition, a latch consisting of X3 and X4 is also introduced in the delay cell, which can guarantee the rail to rail switching. The delay time of this architecture is determined by the relative strength ratio between the input inverters and the latch. The strength of the input inverters X1 and X2 are governed by transistors M1 and M2. The stronger they are, the shorter the delay time would be [3.9].



(a)



Figure 3-13 Delay cell with fully differential current starved inverter

### (a) Block diagram (b) Schematic

In most applications, the frequency tuning range for a VCDL is more than sufficient. Therefore, the control sensitivity should be minimized in order to reduce the phase noise [3.10]. The gate voltage has to be designed large to bias the device more into the triode region. This would reduce the control sensitivity and improves the linearity, and it also reduces its sensitivity to power supply and substrate noise [3.9]. Ideally the bias signal Vctrln should be designed close to the supply voltage, and Vctrlp close to the ground. However the performance would get worse or maybe completely destroyed in the other corner analysis, especially in the slow-slow corner. Considering the performance of all the corners, the delay cell is designed with these two control voltages close to half of the supply voltage. Figure 3-14 illustrates the curve of delay time versus the control voltage, and in this working region it is rather linear.



Figure 3-14 Delay time versus control voltage

# 3.4 Simulation Results of DLL

The overall performance is listed in table 3-1. This DLL can achieve a good jitter performance while maintaining low power consumption. Concerning the DLL itself is running at 100MHz and 16 DLL lines are generated, the jitter performance is also simulated under noisy environment conditions (A DC supply with two 10mV amplitude sine waves, the running frequency at 100MHz and 1.6GHz, respectively).

### Table 3-1 DLL performance

| Tuning Range                                         | 60MHz-150MHz |
|------------------------------------------------------|--------------|
| Rms jitter(@100MHz)                                  | 3.16ps       |
| Peak-Peak jitter(@100MHz)                            | 19.5ps       |
| Rms jitter under noisy supply voltage(@100MHz)       | 3.7ps        |
| Peak-Peak jitter under noisy supply voltage(@100MHz) | 23.0ps       |
| Duty cycle(@100MHz)                                  | 49.76%       |
| Lock in time(@100MHz)                                | 3.5µs        |
| Power consumption(@100MHz)                           | 2.1mW        |

The differential control voltage is shown in figure 3-15, and the control signal Vctrln is zoomed in. In the locked-in phase, the voltage variation is only 120uV, which is equal to a delay time of 0.3ps. And this amount has a negligible effect on the jitter performance.



Figure 3-15 Differential control signals

The jitter performance is shown is figure 3-16. Considering the reference clock is fixed, the edge to edge jitter is simulated. The rms jitter is only 3.16ps with the peak to peak jitter of 19.5ps. Under a noisy power supply, the DLL is still able to provide a satisfying jitter performance. The delay time variation due to the mismatch between delay cells is also simulated, and its 3 standard variation is 20ps.







(b)



#### (a) Under clean supply (b) Under noisy supply

The performance is also verified with the corner analysis, and this DLL can function well without causing the false locking problem in all the corners. Figure 3-17 shows the simulated differential control signals Vctrlp and Vctrln in all the corners. Initially these two control signals are biased to the supply and ground, then the system starts charging or discharging the loop filter until the Out\_ref and the Ref\_clk are locked to one clock

cycle. These two control signals get stabilized from  $3.5\mu$ s for the slow-slow corner, and they get stabilized from  $2\mu$ s for the other 3 corners.



Figure 3-17 Corner analysis of the DLL

# 3.5 References

[3.1] C.H. Kim., A 64-Mbit, 640-MByte/s bidirectional data strobed, double-data-rate SDRAM with a 40-mW DLL for a 256-MByte memory system. IEEE Journal of Solid State Circuits, 1998. Volume: 33, Issue: 11. Page: 1703 – 1710.

[3.2] Chang R. C. -H., A Multiphase-Output Delay-Locked Loop With a Novel Start-Controlled Phase/Frequency Detector. IEEE Transactions on Circuits and Systems I: Regular Papers, 2008. Volume: 55, Issue: 9. Page: 2483 – 2490.

[3.3] Rhee, W., Design of high-performance CMOS charge pumps in phase-locked loops. ISCAS, 1999. Volume: 2. Page: 545 – 548.

[3.4] Donmez, A., Wideband PLL System as a Clock Multiplier, Master thesis, Delft University of Technology, 2009. Page: 31 – 48.

[3.5] Olsson, H., Design of a high speed low-voltage charge pump for wideband phaselocked loops. ICECS, 2003. Volume: 1. Page: 148 – 151. [3.6] Maneatis, J.G., Low-jitter process-independent DLL and PLL based on self-biased techniques. IEEE Journal of Solid-State Circuits, 1996. Volume: 31, Issue: 11. Page: 1723 – 1732.

[3.7] Lah, L., A continuous-time common-mode feedback circuit (CMFB) for highimpedance current-mode applications. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 2000. Volume:47, Issue: 4. Page: 363 – 369

[3.8] Jeong, D.K., Design of PLL-based clock generation circuits. IEEE Journal of Solid-State Circuits, 1987. Volume: 22, Issue: 2. Page: 255 – 261.

[3.9] Dai, L., A low-phase-noise CMOS ring oscillator with differential control and quadrature outputs. 14th Annual IEEE International ASIC/SOC Conference, 2001. Proceedings. Page: 134 – 138.

[3.10] Wilson W. B., A CMOS Self-calibrating Frequency Synthesizer. IEEE Journal of Solid State Circuits, 2000. Volume: 35, Issue: 10. Page: 1437 – 1444.

# Chapter 4 Column Circuit Design

The circuits implemented in the column level are presented in this chapter, including the comparator design, and also some digital circuits design.

# 4.1 Comparator design

A comparator is an essential part of an ADC. The comparator performs the function of amplifying the difference of the input terminals to a digital decision.

### 4.1.1 Design considerations

Many aspects need to be considered during the design of a comparator, and the first one would be the gain. For this design, the input range is from 0.7V to 1.5V, which means the LSB is 195uV for a 12-bit ADC. To amplify 1/2 LSB (98uV) to 1.8V, the gain of comparator should beyond 85dB. Meanwhile the offset (3 $\sigma$ ) of the comparator should also be designed within 1/2 LSB.

Next is the delay time of the comparator. A clock frequency of 100MHz is applied in this design, which generates a 1.6GHz pseudo clock by the DLL. Normally the delay time of the comparator needs to be designed within a period equivalent to 1/2LSB which is 312.5ps in this case (1LSB is equivalent to 625ps). Considering  $\tau$ =RC=312.5ps, the bandwidth of the comparator needs to be larger than 500MHz, which is impossible to be achieved for high gain amplifiers, and also this will increase the noise level and the power consumption.

Fortunately it is not necessary to design such a high speed comparator which will consume huge power. For the image sensor application, the absolute delay time is not

that important because it will be compensated by the CDS, what is crucial for the performance is a constant delay time. This means the delay time does not have to be within 312.5ps, only the delay variation needs to be controlled. Actually the delay time variation directly relates to the FPN and it should be controlled within 0.1% of the total conversion time.

The noise also needs to be considered. In the comparator design, both the thermal noise and flicker noise will affect the performance. The flicker noise is inversely proportional to the frequency, and the thermal noise will be the dominating noise source in a wideband comparator.

Normally comparators are designed with the architecture of: a pre-amplifier with a latch at output. Even though a latch is power efficient for the fast conversion, it needs a clock signal to control the operation. Here in this design, the comparator is incorporated with the DLL to make the digital decision, which means this decision may happen between the clock edges. This makes it not suitable to use a latch here, and to choose a differential to single ended amplifier instead of a latch at the output is a better choice.

With the overall consideration of the above specifications, the multi-stage comparator architecture illustrated in figure 4-1 is chosen which can achieve high gain with low power consumption [4.1]. The first two stages are two fully differential pre-amplifiers, the third stage is a differential to single ended amplifier.



Figure 4-1 Comparator: Multi-stage comparator

### **4.1.2** Auto zero technique to suppress the offset

The mismatch between transistors due to the fabrication technology and layout will have an influence on threshold voltage and drain circuit, and this will lead to the offset in a differential amplifier. The offset of amplifiers can be as large as tens of millivolt which will deteriorate the performance of the ADC. To obtain a satisfying column ADC performance especially with a low column PFN, the offset of the comparator must be minimized. The offset can be expressed as:

$$\sigma = \frac{A_{\rm VT}}{\sqrt{W \times L}} \tag{4-1}$$

 $A_{VT}$  is a technology related parameter. W and L are the width and length of the transistor. The most direct way to minimize the offset is to increase the transistor size. However this comparator needs to fit into the pixel pitch of 5.4µm, which makes it impossible to increase the transistor size without limitation. A DAC trimming method can be used to eliminate the offset, but then a DAC is needed in each column [4.2]. Another effective way to suppress the offset can be achieved by auto zero technique [4.3].

The basic idea of auto zero is sampling the unwanted quantity (noise and offset) and then subtract it from the instantaneous value of the contaminated signal either at the input or the output of the amplifier [4.3]. This cancellation can also be done at some intermediate node between the input and the output of the amplifier.

4.1.2.1 Input offset storage (IOS)



Figure 4-2 Auto zero: Input offset storage

Figure 4-2 illustrates the auto zero architecture with input offset storage.  $V_{cm}$  is the common mode voltage, and  $V_{OS1}$  and  $V_{OS2}$  are the offset voltages of the pre-amplifier

and the latch, respectively. The auto zero process can be divided into the offset storage phase and the comparison phase. Figure 4-3(a) illustrates the offset storage phase. S1 is off, S2 and S3 are on, the pre-amplifier and the switches form a unity-gain feedback loop and the charges containing the offset information are stored on capacitor pair C1 and C2.  $V_{C1}$  and  $V_{C2}$  are the voltages across capacitors C1 and C2, and A is the gain of the pre-amplifier.

$$(V_{C1} - V_{C2} + V_{OS1}) \times A = -(V_{C1} - V_{C2})$$
(4-2)

 $V_{C1} - V_{C2} = -\frac{A}{1+A} \times V_{OS1}$ 



Figure 4-3 Auto zero: Input offset storage

(a) Offset storage phase (b) Comparison phase

In the comparison phase, S1 is on, S2 and S3 are off, as shown in figure 4-3(b). The input voltages are applied to the pre-amplifier via the DC coupling capacitor pair.  $V_{in}$  is the system differential input signal, and  $V_{in} = V_{inp} - V_{inn}$ .  $V_{out}$  is the differential voltage between the pre-amplifier output nodes.

(4-3)

$$V_{out} = (V_{in} + V_{C1} - V_{C2} + V_{OS1} + \frac{\Delta Q}{C}) \times A$$
 (4-4)

$$V_{out} = (V_{in} + \frac{V_{OS1}}{1+A} + \frac{\Delta Q}{C}) \times A = (V_{in} + V_{OS}) \times A$$
(4-5)

$$V_{\rm OS} = \frac{V_{\rm OS1}}{1+A} + \frac{\Delta Q}{C} \tag{4-6}$$

Take the offset of the latch into consideration, the total offset  $V_{OS}$  at the input is:

$$V_{\rm OS} = \frac{V_{\rm OS1}}{1+A} + \frac{\Delta Q}{C} + \frac{V_{\rm OS2}}{A}$$
(4-7)

With this approach, the remaining offset after auto-zero is related to the gain. To get a low offset, a large gain pre-amplifier is needed. In addition, the charge injection from the switches will also cause the offset. Fortunately, in the differential architecture only the mismatch between the injected charges on the capacitor  $\Delta Q$  will contribute to the offset.

What is critical for this architecture is the capacitor size, because

$$V_{in1} = V_{in} \times \left(\frac{C_C}{C_C + C_L}\right)$$
(4-8)

 $C_{\rm C}$  is the coupling capacitor size, and C1=C2=Cc.  $C_{\rm L}$  is the input capacitor of the preamplifier. V<sub>in1</sub> is the differential voltage between the pre-amplifier input nodes. When the input signal is applied, the voltage across the capacitor is not a constant value during the comparison phase, which is due to the voltage attenuation between C1, C2 and the input capacitor of the pre-amplifier. To minimize this effect, a large size capacitor is needed, and this does not fit for the column use.

#### 4.1.2.2 Output offset storage (OOS)

Figure 4-4 illustrates the architecture of auto zero with an output offset storage approach. The process is almost the same as the input offset storage approach, while the difference is the subtraction of the offset from the signal is done at the output node. This is illustrated in figure 4-5.



Figure 4-4 Auto Zero: Output offset storage

In the offset storage phase, figure 4-5(a), S1 is off, S2 and S3 are on. The input terminals are connected to the common mode node, thus the offset voltage  $V_{OS1}$  is amplified by the pre-amplifier and stored on capacitor pair C1 and C2.







(a) Offset storage phase (b) Comparison phase

During the comparison phase, S1 is on, S2 and S3 are off, C1 and C2 behave as the DC coupling capacitors and the voltage across them should be constant.

$$V_{out} = A \times (V_{in} - V_{OS1}) + V_{C1} - V_{C2} + \frac{\Delta Q}{C}$$
 (4-10)

$$V_{out} = A \times (V_{in} + \frac{\Delta Q}{A \times C})$$
(4-11)

$$V_{\rm OS} = \frac{\Delta Q}{A \times C} \tag{4-12}$$

Where  $V_{in} = V_{inp} - V_{inn.}$  Take the offset of the latch into consideration, the total equivalent offset at the input is:

$$V_{\rm OS} = \frac{\Delta Q}{A \times C} + \frac{V_{\rm OS2}}{A}$$
(4-13)

With this approach and in the ideal case, the comparator does not have the remaining offset. However, in the real case the mismatch between switches will also contribute to the offset, even though it is A times smaller compared to the input offset storage approach.

The disadvantage of the output offset storage approach is that the gain of the preamplifier is limited. This is to avoid the saturation of the load capacitors because the offset voltage is amplified in this case:

$$A \times V_{OS1} \le Output swing$$
 (4-14)

It is also important to choose a reasonable capacitor value for the consideration of attenuation. However this attenuation is located at the output of the pre-amplifier, which makes that the equivalent voltage at the input is divided by the gain and it is not as obvious as the input offset storage approach.

There are also some disadvantages about this output offset storage approach. Firstly, the gain of the pre-amplifier is limited as mentioned above, and to achieve 85dB gain 4-stage amplifier is need. In addition, each increased stage needs 2 extra capacitors which will increase the area.

### 4.1.3 Comparator design

In this design the comparator with the combined input and output offset storage architecture [4.4] is implemented. Figure 4-6 shows the circuit architecture and the timing diagram. With this architecture both the first and second stage pre-amplifiers store their offset on capacitors C1 and C2. Considering the first and second stage pre-amplifier are designed with a gain of 78dB (37dB+41dB) in total, the remaining offset of the third stage amplifier is negligible. A1 and A2 are the gain of the 1st and 2nd stage pre-amplifier.



### Figure 4-6 Architecture of the comparator in this design with the timing diagram

The comparator utilizes the timing that is a variant of a two-phase non-overlapping clock. In offset storage phase, the offset of the first pre-amplifier is stored first, then S3 goes on and the offset of the second stage pre-amplifier is stored, as illustrated in figure 4-7(a). By storing the offset voltages of the first and second stages in succession, the duration of the offset storage period is reduced because co-settling instabilities are avoided [4.5].

$$V_1 - V_2 = -A_1 \times V_{OS1}$$
(4-15)

$$V_3 - V_4 = -A_2 \times (V_3 - V_4 + V_{OS2})$$
(4-16)

$$V_{C1} - V_{C2} = (V_3 - V_4) - (V_1 - V_2)$$
 (4-17)

$$V_{C1} - V_{C2} = A_1 \times V_{OS1} - \frac{A_2}{1 + A_2} \times V_{OS2}$$
 (4-18)



(a)



(b)

Figure 4-7 Auto zero: Combined input and output offset storage

### (a)Offset storage phase (b) Comparison phase.

In the comparison phase, figure 4-7(b), the input signals together with the capacitor stored offset voltage will make the decision.  $V_{os1}$ ,  $V_{os2}$  and  $V_{os3}$  are the offset voltage of the 1st, the 2nd and the 3rd stage amplifiers.  $V_1$ ,  $V_2$ ,  $V_3$  and  $V_4$  are the voltage at node 1, 2, 3 and 4, respectively.  $V_{out2}$  is the differential voltage between the output nodes of the 2nd stage pre-amplifier. The remaining offset with this approach is:

$$V_{out2} = ((V_{in} + V_{OS1}) \times (-A_1) + (V_{C1} - V_{C2}) + V_{OS2} + \frac{\Delta Q}{C}) \times (-A_2)$$
(4-19)

$$V_{out2} = ((V_{in} + V_{OS1}) \times (-A_1) + (A_1 \times V_{OS1} - \frac{A_2}{1 + A_2} \times V_{OS2}) + V_{OS2} + \frac{\Delta Q}{C}) \times (-A_2)$$
(4-20)

$$V_{out2} = A_1 \times A_2 ((V_{in} + \frac{1}{A_1 \times (1 + A_2)} \times V_{OS2} - \frac{\Delta Q}{A_1 \times C})$$
(4-21)

$$V_{\rm OS} = \frac{1}{A_1 \times (1 + A_2)} \times V_{\rm OS2} - \frac{\Delta Q}{A_1 \times C}$$
(4-22)

Where  $V_{in} = V_{inp} - V_{inn}$ . A<sub>1</sub> and A<sub>2</sub> are the gain of the 1st and 2nd stage pre-amplifier, respectively. The first stage pre-amplifier is designed with a relative small gain, which will not cause the saturation of the capacitor pair. The second stage pre-amplifier can be designed with a larger gain, which is enough to suppress the offset together with the first stage pre-amplifier because their total gain is designed beyond 60dB which can suppress the offset by 1000 times. As mentioned above, the offset is stored in the intermediate node and the attenuation will not cause a big problem. Another advantage of this architecture is that only 2 capacitors are used and area occupation is not that much. Additionally, the auto zero will only increase the Conversion time slightly compared to the other offset cancellation method such as the DAC trimming method.

### 4.1.4 Amplifier design

#### 4.1.4.1 Pre-amplifier design

The architecture of the pre-amplifier is shown in figure 4-8. The first two stages share the same architecture, but they are designed with different specifications. There are two considerations for this, and first it is for the consideration of the delay time and FPN. For a long delay time, a small relative variation will cause a large absolute variation of the delay time. So it is important to control the delay time that will not cause the large column FPN. Because the first stage amplifier suffers from the longest delay, it is designed with a bandwidth as high as 3.5MHz. Another consideration is for the noise. The input referred noise of the comparator mostly comes from the 1st stage preamplifier, and to design it with large bias current will lower the noise level.

The amplifier shown above is a fully differential amplifier with PMOS diode connected load and cross coupled transistors. The diode connected transistor load acts as a positive resistor with the impedance of  $1/g_{m4}$ , while the cross coupled transistor acts as

a negative resistor with its impedance of  $1/g_{m5}$ . The negative resistor will cancel the positive resistor to generate the output impedance of  $1/(g_{m4} - g_{m5})$ .



Figure 4-8 Schematic of the pre-amplifier

The gain of the pre-amplifier is

Gain = 
$$g_m \times R_{out} = \frac{g_{m2}}{g_{m4} - g_{m5}}$$
 (4-23)

Considering

$$g_{\rm m} = \frac{\mu C_{\rm OX} W \times V_{\rm gt}}{L}$$
(4-24)

 $g_m$  is the trans-conductance,  $R_{out}$  is the output impedance,  $\mu$  is the electron mobility,  $C_{OX}$  is the gate capacitance per unit area, W and L are the width and length of the transistor,  $V_{gt}$  is the overdrive voltage. The diode connected transistor M4 with the cross coupled transistor M5 have the same gate voltage, so the gain of the amplifier is only decided by the transistor ratio of the two. If M4 and M5 are chosen to be identical the gain can achieve infinite in theory, however this is never achieved due to the mismatch. Table 4-1 gives the performance of the pre-amplifiers, and the offset is shown in figure 4-9.







(b)

Figure 4-9 Offset of the pre-amplifier (a) 1st stage (b) 2nd stage

|            | 1st stage pre-amplifier | 2nd stage pre-amplifier |
|------------|-------------------------|-------------------------|
| Gain       | 37dB                    | 41dB                    |
| Bandwidth  | 3.5MHz                  | 1MHz                    |
| Power      | 85μW                    | 9μW                     |
| Offset(3σ) | 2.57mV                  | 5.37mV                  |

### Table 4-1 Simulation results of the pre-amplifiers

### 4.1.4.2 Differential to single ended amplifier design

The third stage amplifier is used to convert the differential input to the single ended output, and also it should guarantee the rail to rail output. Considering the two previous stage pre-amplifiers have already contributed the gain of more than 75dB, for the output amplifier, the gain does not need to be designed large, otherwise it is at the expense of power.



Figure 4-10 Schematic of the differential to single ended amplifier

The symmetrical amplifier, as shown in figure 4-10, is designed in this work. The transconductance  $g_{m1}$  is determined by the two input transistors M1 and M2, and the output impedance  $R_{out}$  is determined by the length of the two output transistors M6 and M8.  $C_L$  is the load capacitance. Considering the fact that the bandwidth of the previous stage is 1MHz, this amplifier is designed with the same bandwidth to save power.

$$Gain = g_{m1} \times R_{out}$$
(4-25)

Bandwidth 
$$=\frac{1}{2\pi \times R_{out} \times C_L}$$
 (4-26)

This amplifier architecture has a second pole which is not going to cause the instability issue because in this design it is not working in a feedback loop. The offset and noise are neither important, simply because it is the third stage amplifier. This amplifier can achieve a gain of 40dB with a bandwidth of 1.6 MHz. The power consumption is  $9\mu$ W

### 4.1.4.3 Noise consideration:

The comparator is composed of a 3-stage amplifier, the noise mostly comes from the first stage amplifier and this is also confirmed with the simulation result. The two dominating noise sources are the flicker noise and the thermal noise:

Thermal noise:

The equivalent input referred thermal noise can be expressed as:

$$V_{eq,t}^2 = 4kT \frac{2}{3g_m} \Delta f$$
(4-27)

 $V_{eq,t}^2$  is the power of the input referred thermal noise, k is the Boltzmann constant, T is the absolute temperature, and  $g_m$  is the trans-conductance,  $\Delta f$  is the noise bandwidth. To minimize the thermal noise there are two options: To increase the  $g_m$ , or to decrease the bandwidth. In this design  $g_m$  is maximized for the consideration of the bandwidth which is critical for the FPN performance.

#### Flicker noise:

Equation 4-28 shows the amount of the flicker noise. An effective way to minimize the flicker noise is to increase the transistor area. However the pixel pitch provides the limitation of the area.
$$V_{eq,1/f}^{2} = \frac{K_{1/f}}{2\mu C_{OX}^{2}WL} (\frac{\Delta f}{f})$$
 (4-28)

 $V_{eq,1/f}^2$  is the power of the input referred flicker noise,  $K_{1/f}$  is the flicker noise coefficient,  $\mu$  is the electron mobility,  $C_{ox}$  is the gate capacitance per unit area, W and L are width and length of the transistor,  $\Delta f$  is the noise bandwidth, and f is the frequency.

The noise bandwidth of the one pole amplifier is  $\pi/2$  times of the pole frequency [4.6], which is equal to 5.5MHz in this design. The input referred noise of the comparator within this bandwidth is 27µV. Actually the flicker noise is also reduced by the auto-zero, so the real input referred noise is smaller than the simulation result. Table 4-2 shows the noise contribution of the transistors in the pre-amplifier.

| Transistor | Туре    | Noise contribution |
|------------|---------|--------------------|
| M2         | Thermal | 23.24%             |
| M3         | Thermal | 23.24%             |
| M2         | Flicker | 15.10%             |
| M3         | Flicker | 15.10%             |
| M4         | Thermal | 4.60%              |
| M7         | Thermal | 4.60%              |
| M5         | Thermal | 4.45%              |
| M6         | Thermal | 4.45%              |

Table 4-2 Noise contribution of the transistors in the pre-amplifier

#### 4.1.5 Performance of the comparator

After the auto zero process the remaining offset  $(3\sigma)$  is only  $10\mu$ V, the delay time is 46.7ns, while the variation of the delay time  $(\sigma)$  is only 893ps. The Monte Carlo simulation with 100 iterations provides the distribution of the offset and delay time as shown in figure 4-11 and 4-12, respectively.



Figure 4-11 The offset of the comparator with auto zero



Figure 4-12 Delay time of the comparator with auto zero

The performance in four corners is also verified, shown in table 4-3, the largest delay variation ( $\sigma$ ) is 910ps in the Slow-Fast corner. The delay time ranges from 44.69ns (Fast-

Fast) to 49.30ns (Slow-Slow), but this is not important because this will be compensated by the correlated double sampling.

| Corners     | Delay   | Delay variation ( $\sigma$ ) |  |
|-------------|---------|------------------------------|--|
| Typical     | 46.77ns | 893ps                        |  |
| Fast - Fast | 44.69ns | 898ps                        |  |
| Slow - Slow | 49.30ns | 868ps                        |  |
| Fast - Slow | 47.56ns | 870ps                        |  |
| Slow - Fast | 45.86ns | 910ps                        |  |

Table 4-3 Corner simulations of the delay time of the comparator with auto zero

Figure 4-13 illustrates the delay time of the comparator versus the common mode input voltage. The delay time ranges from 43.3ns to 46.7ns which is only obvious from 1.1V to 1.5V, and this variation is equivalent to 1.7ps/LSB. This will only generate a maximum non-linearity in the level of 0.003LSB (1.7ps/625ps, 625ps is the equivalent time of 1LSB in this design).



Figure 4-13 Delay time versus common mode input voltage

Table 4-4 and 4-5 list the delay time of the comparator with the common mode input voltage at 1.5V and 0.7V, corresponding to the dark environment and the bright environment, respectively. The delay variation in the four corners is also verified in these two cases.

| Corners     | Delay   | Delay variation ( $\sigma$ ) |
|-------------|---------|------------------------------|
| Typical     | 46.71ns | 958ps                        |
| Fast - Fast | 44.99ns | 949ps                        |
| Slow - Slow | 48.67ns | 952ps                        |
| Fast - Slow | 47.30ns | 927ps                        |
| Slow - Fast | 46.13ns | 984ps                        |

#### Table 4-4 Delay time at bright (Common mode input=0.7V)

#### Table 4-5 Delay time at dark (Common mode input =1.5V)

| Corners     | Delay                | Delay variation ( $\sigma$ ) |
|-------------|----------------------|------------------------------|
| Typical     | Typical 43.31ns 697p |                              |
| Fast - Fast | 41.96ns              | 703ps                        |
| Slow - Slow | 44.69ns              | 696ps                        |
| Fast - Slow | 43.52ns              | 672ps                        |
| Slow - Fast | 42.83ns              | 734ps                        |

The overall performance of the comparator is given in table 4-6. It can achieve 118dB gain with the input range of 0.7~1.5V. The delay time variation is small and this can guarantee a low column FPN. The input referred rms noise of this comparator is  $27\mu$ V which is much lower than the LSB 195 $\mu$ V, but the dominating noise source thermal noise follows the Gauss distribution and that is the  $3\sigma$  value of the noise will affect the non-linearity of the ADC. The  $3\sigma$  value of the noise is  $81\mu$ V which is also within 1/2LSB. The power consumption is well controlled and the comparator can be fit into the column with the pixel pitch of 5.4 $\mu$ m.

| Table 4-6 | Performance | of the | comparator |
|-----------|-------------|--------|------------|
|-----------|-------------|--------|------------|

|                              | Target              | Achieved    |
|------------------------------|---------------------|-------------|
| Gain                         | 85dB                | 118dB       |
| Input range                  | 0.7V~1.5V 0.7V~1.5V |             |
| Delay variation ( $\sigma$ ) | 2.56ns              | 893ps       |
| Noise                        | 33µV                | 27μV        |
| Power                        | na                  | 103µW       |
| Area                         | 5.4µm×na            | 5.4μm×220μm |

### 4.2 Digital circuit design

Apart from the comparator, there are also some digital circuits need to be implemented in the column level. The counter and the encoder used in this design is discussed in this section.

### 4.1.1 Counter design

A ripple counter [4.7] is designed for the column use. This ripple counter has one clock input and two control signals, as illustrated in figure 4-14.



Figure 4-14 Ripple counter [4.7]

The counter is only enabled when the output from the comparator is high, and it will latch the result when the falling edge of the comparator arrives. After each conversion cycle, the counter is reset to zero for the next conversion. Compared to the synchronous counter, the load of the clock signal is 8 times smaller because it is only connected to one D Flip-Flop in each column. In addition, the performance is better because the clock skew will not cause the FPN in this case. 8 counter cells are connected in series, so the inverter of the conventional D Flip-Flop is not compulsory here as shown in figure 4-14. The 11T architecture of the delay cell with reset is used which would help to reduce the power and the area. The average power consumption is  $8.6\mu$ W, with the maximum power of  $13.7\mu$ W, and the minimum power of  $3.6\mu$ W.

#### 4.1.2 Encoder design

The encoder implemented in this design has the function of encoding the 16-bit cyclic thermometer code to the 4-bit binary code.

| DLL outputs      | Binary |
|------------------|--------|
|                  | Code   |
| TABCDEFG         | HXYZ   |
| 100000001111111  | 0000   |
| 110000000111111  | 0001   |
| 111000000011111  | 0010   |
| 111100000001111  | 0011   |
| 111110000000111  | 0100   |
| 111111000000011  | 0101   |
| 111111100000001  | 0110   |
| 111111100000000  | 0111   |
| 011111110000000  | 1000   |
| 0011111111000000 | 1001   |
| 0001111111100000 | 1010   |
| 0000111111110000 | 1011   |
| 0000011111111000 | 1100   |
| 0000001111111100 | 1101   |
| 000000111111110  | 1110   |
| 0000000011111111 | 1111   |

#### Figure 4-15 DLL outputs with the corresponding binary code

Considering that the 16-bit delay line from the DLL is the cyclic thermometer code, as shown in figure 4-15. The upper 8-bit (TABCDEFG) are just the inverted signal of the lower 8-bit, so either the upper or the lower bits are sufficient to perform the encode

function. In addition, using 8 bits instead of 16 will save power because only half of these delay lines are distributed over the columns. The upper 8 bits are used because they have a better jitter performance than the lower 8 bits due to the nature of DLL. The encode function is shown from equation 4-29 to 4-32.

$$H = \overline{T}$$
(4-29)

$$X = D \oplus \overline{T} \tag{4-30}$$

$$Y = \overline{B \oplus C} \cdot \overline{C \oplus D} \cdot \overline{F \oplus G} \cdot \overline{G \oplus T}$$
(4-31)

$$Z = \overline{A \oplus B} \cdot \overline{C \oplus D} \cdot \overline{E \oplus F} \cdot \overline{G \oplus T}$$
(4-32)

The encoder only performs the encoding function once in one conversion cycle, so the power consumed is negligible. But the area occupation is relative large and the 8-bit signal lines need carefully routing in the layout.

#### 4.3 Conclusion

The design of the column level circuits is discussed in this chapter. A multi-stage comparator architecture with auto zero technique is implemented, and it can achieve the specifications with low power consumption within a small area. The ripple counter is implemented which can save power and guarantee low FPN, and this is discussed in the next chapter. The encoder is implemented using half of the delay lines with the output signals synchronous to the clock signal to solve the misalignment problem, and the detail is also discussed in the next chapter.

#### 4.4 **References**

[4.1] Wooley, B. VLSI data conversion circuits. EE315. Stanford University, 2001.
[4.2] Park, S., A 4GS/s 4b Flash ADC in 0.18μm CMOS. ISSCC 2006. Page: 2330 – 2339.
[4.3] Enz, C.C., Circuit techniques for reducing the effects of op-amp imperfections: auto zeroing, correlated double sampling, and chopper stabilization. Proceedings of the IEEE, 1996. Volume: 84, Issue: 11. Page: 1584 – 1614.

[4.4] Razavi, B., Design techniques for high-speed, high-resolution comparators. IEEE Journal of Solid State Circuits, 1992. Volume: 27, Issue: 12. Page: 1916 – 1926.

[4.5] Erik, P., A 60-MHz 150-μV Fully-Differential Comparator. Journal of stellar circuits. EE315. Stanford University. <a href="http://www.stanford.edu/~jsdaniel/comparator.pdf">http://www.stanford.edu/~jsdaniel/comparator.pdf</a>>

[4.6] Shirai, E., CMOS Multistage Preamplifier Design for High-Speed and High-Resolution Comparators. IEEE Transactions on Circuits and Systems II: Express Briefs, 2007. Volume: 54, Issue: 2. Page: 166 – 170.

[4.7] Chae, Y., A 2.1 M Pixels 120 Frame/s CMOS Image Sensor with Column-Parallel sigma delta ADC Architecture. IEEE Journal of Solid-State Circuits, 2011. Volume: 46, Issue: 1. Page: 236 – 247.

## **Chapter 5** Top View

In this chapter some top level topics are discussed, including the Correlated Double Sampling (CDS) architecture, the misalignment between coarse conversion and fine conversion. The overall performance of this DLL based Single Slope ADC is given at the end of the chapter.

### 5.1 Correlated Double Sampling

To make this ADC architecture also applicable for the readout of the pinned photodiode APS, it is necessary to implement a Correlated Double Sampling (CDS), which is used to compensate the mismatch of the source follower and to compensate for the kTC noise. The implementation of the CDS can be done in the analog domain or in the digital domain.

#### 5.1.1 Analog CDS

As illustrated in figure 5-1, there are two methods to implement the CDS in the analog domain, first is to use an extra subtraction circuit to do the CDS, the other option is to implement the CDS together with the comparator.

#### 5.1.1.1 Separate CDS

As shown in figure 5-1(a), the column signal processing is divided into two parts: CDS and column ADC. For this architecture, one additional amplifier is needed to implement the subtraction function. The conversion time only increases a little bit, while this is at the expense of power consumption and area.



Figure 5-1 Analog Correlated Double Sampling

#### (a) Separate CDS (b) Combined CDS with Comparator

The architecture of figure 5-1(a) is a good CDS candidate for our ADC, but it is not chosen because of the mismatch between capacitors. For example, in this given

technology, the capacitor area needs to be larger than  $600\mu m^2$  to achieve the accuracy of 0.1%, so the capacitor area will be not acceptable for the high resolution ADC.

#### 5.1.1.2 Combined CDS with comparator

With the architecture shown in figure 5-1(b) [5.1], no additional subtraction circuit is needed. The reset signal and the video signal are sampled on the two coupling capacitors in succession. Differential ramp voltages are applied via the bottom plate of the capacitors to do the subtraction. The advantage of this architecture is to use the same amplifier to do the auto zero and the CDS which will save power and area.

However, the input signal attenuation is a problem for this architecture. Differential ramp generator is another issue, and any mismatch between these two differential ramp signals will deteriorate the performance.

#### 5.1.2 Digital CDS

The basic idea of digital CDS is to digitize the reset signal and video signal of the pinnedphotodiode pixel separately, as is shown in figure 5-2(a), which is the conventional digital CDS architecture. The latch stores the digitized reset and video signal in succession, which will be processed later to obtain the N-bit output.

Figure 5-2(b) shows the digital CDS implemented by means of an up/down counter. During the reset signal readout period, the counter works in the downwards direction. In the video signal readout period, the counter works in the upwards direction. With this column up/down counter implemented, the subtraction is done automatically and FPN due to the clock skew is not an issue.

The advantage of digital CDS is that the circuit is quite simple, and the offset of the comparator is compensated automatically without extra circuits. But there are also some disadvantages: first is the conversion time, it can be seen from the timing diagram, the conversion time is almost doubled. Besides, digital CDS will also increase the quantization noise because in every conversion cycle the signals are digitized twice.





#### 5.1.3 CDS with XOR gate

Both these digital CDS methods mentioned above will increase the conversion time. Another idea was proposed in [5.2] which can perform the CDS within one conversion cycle, and this is achieved by adding one comparator with one XOR gate as illustrated in figure 5-3. The counter starts when the ramp voltage reaches the reset signal, and it will stop when the ramp voltage exceeds the video signal. In this way, the counter only counts in one direction and CDS is implemented without increasing the readout time.



Figure 5-3 CDS with XOR gate [5.2]

There are also some disadvantages accompanied with this architecture. The problem is that this will increase the power consumption and area because two comparators are used within each column. Another issue is the offset of the comparator, the offset is not compensated automatically and auto zero is needed with this architecture. Considering both the rising and falling edge of the comparator are used for the readout, the FPN due to the comparator offset is also doubled.

#### 5.1.4 CDS design

In this design the CDS with XOR gate [5.2] is implemented, because here high priority is given to the conversion speed. The standard deviation of delay time in this design is only 893ps, which equals to 0.035% FPN. Even though this is doubled, it is still within the specification of 0.1%. The architecture of the column readout circuit with CDS is illustrated in figure 5-4, and the corresponding timing diagram is shown in figure 5-5.



Figure 5-4 Overall architecture with CDS implemented

During the period of the pixel operation and signal sampling, the counter is reset to 0 and meanwhile the auto zero is also implemented. After that, the reset signal and the video signal are both applied to the comparators, and the ramp signals are also applied to the two comparators. When the ramp signal exceeds the reset voltage, the counter starts counting. Meanwhile the rising-edge sensitive latch1 stores the digital code from the DLL, which is send to the encoder to obtain the lower 4 fine bits of the reset signal. Afterwards, the counter stops counting when ramp signal reaches the video signal and the upper 8 coarse bits are stored on the counter. Meanwhile, the falling-edge sensitive latch2 will store the lower 4 fine bits of the video signal.



Figure 5-5 Timing diagram of the ADC with CDS implemented

### 5.2 Non-linearity due to the clock skew

### 5.2.1 Misalignment problem

For a two step ADC, misalignment between the fine conversion and coarse conversion is always a serious problem. Normally an extra bit in the classical two-step ADC is needed to overcome this problem and the correction circuit is needed [5.8].

The ideal case is shown in figure 5-6(a). There is no mismatch between coarse bits and the fine bits.



(a)



Figure 5-6 Misalignment of 2-step ADC (a) Ideal case (b) Real case

Figure 5-6(b) illustrates the real situation with clock skew. In the real case, for very high frequency clock signal, the mismatch between the clock line and the delay lines from DLL can be several clock cycles, and this will generate INL and DNL as large as tens of LSB depending on the resolution of the fine conversion. The fine bits can also be designed synchronous to the clock signal, then this large non-linearity issue is avoided but some missing code will be generated.

#### 5.2.2 Misalignment problem in this design

The misalignment is also a potential issue in this design. The clock signal and the DLL data lines are distributed over the columns, so the mismatch between them will contribute to the non-linearity of the ADC.

To minimize this misalignment it is important to guarantee the load of clock line and the DLL date lines are identical. In this design each signal line only drives one DFF within each column, and total 17 signal lines (Clock×3, DLL<1:7>×2,) are distributed. In the layout of this design, the top layer metal is used for the clock line and DLL data lines with large distance in between, and also the overlap area with other metal layers are minimized.

The clock line and the data lines from the DLL are distributed over all the 330 columns, which is 1.8mm long if the pixel pitch is 5.4 $\mu$ m. For the given technology, the resistance of this metal line (metal 6) with the width of  $2\mu$ m is only  $40\Omega$  for the length of

2mm.  $R_{sheet}$  is the sheet resistance of the metal. L and W are the metal length and width, respectively.

$$R = R_{sheet} \times \frac{L}{W}$$
(5-1)

For each of these clock lines and the DLL data lines, the load capacitance only ranges from 1.4pF to 1.6pF with all the coupling capacitors and the parasitic capacitors included. The lumped model is used to calculate the delay time of the metal line for simplicity, and the estimated delay time ranges from 39ps to 45ps. Actually, the lump model only gives a pessimistic estimation, and the real delay time will be less.

$$\tau = 0.69 \times \text{RC} \tag{5-2}$$

Where  $\tau$  is the time constant, and R and C are the load resistance and capacitance. The clock frequency used is 100MHz, and the time scale for the fine bits is 625ps with the DLL implemented. The delay time in this design is within 50ps, and the mismatch between different delay lines is only 6ps. As illustrated in figure 5-7, with the fine bits synchronous to the clock signal, the large non-linearity problem can be avoided, and also this 6ps is not going to cause the missing code issue.



Figure 5-7 Timing of the ADC in this design

### 5.3 Overall performance

The non-linearity:

The non-linearity of this ADC mainly comes from two parts: the comparator and the DLL. However the jitter performance of the DLL is satisfying, even taking all the non-ideality effects into consideration (a noisy environment for DLL, and the maximum delay variation between the delay lines). Most of the non-linearity comes from the noise of the comparator. With a CDS implemented, the overall INL is 0.65LSB and DNL is within 0.95LSB.

#### Power consumption:

Total power is 82mW, with 68mW consumed by the comparators, 3mW consumed by column digital circuits, 2mW consumed by the DLL and 9mW consumed by the buffers. The figure of merit (FoM) for the column level ADC in this design is 0.182pJ/conversion. With:

 $FoM = [Power consumption per column] \times [Conversion time]/2^{ENOB}$  (5-3)

Where ENOB is the effect number of bits, and here in the calculation the resolution N is used instead of ENOB for simplicity. Power consumption per column is the total power consumption divided by the column number. Table 5-1 lists the performance of this work and a comparison with the recently published column level ADCs is given. Because most papers do not mention their power consumption by the ADC parts, it is impossible to make an absolutely fair comparison. Considering most of the power in CMOS image sensor is consumed by the column level ADC, in table 5-1 the FoM of the listed papers are calculated assuming 50% of the total power is consumed by the ADC.

| Pof   |          |       | Divol | Ditch | Conversion | LNI   |       | EDN  | Dower  | FoM             |
|-------|----------|-------|-------|-------|------------|-------|-------|------|--------|-----------------|
| Rei   | ADC (- ) | ADC   |       |       |            |       |       |      | POwer  |                 |
|       | (Type)   | (Bit) | (H*V) | (µm)  | Time (µs)  | (LSB) | (LSB) | (%)  | (mW)   | (pJ/conversion) |
| [5.3] | SAR      | 12    | 4112  | 4.2   | 4.0        | -15   | -0.4  | 0.02 | 1085/2 | 0.244           |
|       |          |       | 2168  |       |            | +2    | +0.7  |      |        |                 |
| [5.4] | SAR      | 10    | 2353  | 7     | 1.77       | na    | na    | 1    | 700/2  | 0.350           |
|       |          |       | 1728  |       |            |       |       |      |        |                 |
| [5.5] | Cyclic   | 13    | 640   | 5.6   | 4.6        | -0.6  | <0.5  | 0.01 | 297/2  | 0.195           |
|       |          |       | 428   |       |            | +3.2  |       |      |        |                 |
| [5.6] | Cyclic   | 12    | 514   | 20    | 0.5        | -2.25 | -0.81 | 0.63 | 1000/2 | 0.115           |
|       |          |       | 530   |       |            | +20.6 | +0.76 |      |        |                 |
| [5.7] | Sigma    | 12    | 1238  | 2.25  | 6.85       | -0.8  | -0.63 | 0.01 | 180/2  | 0.089           |
|       | Delta    |       | 1696  |       |            | +3.7  | +0.55 |      |        |                 |
| [5.8] | 2-step   | 10    | 320   | 5.6   | 4.0        | -1.61 | -0.78 | 0.13 | 36/2   | 0.292           |
|       | SS       |       | 240   |       |            | +1.42 | +0.53 |      |        |                 |
| [5.9] | MRSS     | 10    | 400   | 7.4   | 16.0       | -1.0  | na    | 0.1  | 52/2   | 1.230           |
|       |          |       | 330   |       |            | +1.4  |       |      |        |                 |
| This  | DLLSS    | 12    | 400   | 5.4   | 3.0        | -0.65 | -0.95 | 0.07 | 82     | 0.182           |
| work  |          |       | 330   |       |            | +0.65 | +0.95 |      |        |                 |

Table 5-1 Comparison with recently published column ADC

### 5.4 References

[5.1] Snoeij, M., Analog Signal Processing for CMOS Image Sensors. PHD thesis, Delft University of Technology, 2007. Page: 120 – 122.

[5.2] Bogaerts, J., Analog to Digital conversion in pixel arrays, 2009. Patent No. US2009/0256735 A1.

[5.3] Matsuo, S., 8.9-Megapixel Video Image Sensor with 14-b Column-Parallel SA-ADC. IEEE Transactions on Electron Devices, 2009. Volume: 56, Issue: 11. Page: 2380 – 2389.

[5.4] Alexander I., A High-Speed, 240-Frames/s, 4.1-Mpixel CMOS Sensor. IEEE Transactions on Electron Devices, 2003. Volume: 50, Issue: 1. Page: 130 – 135.

[5.5] Park, J., A High-Speed Low-Noise CMOS Image Sensor with 13-b Column-Parallel Single-Ended Cyclic ADCs. IEEE Transactions on Electron Devices, 2009. Volume: 56, Issue: 11. Page: 2414 – 2422.

[5.6] Furuta, M., A High-Speed, High-Sensitivity Digital CMOS Image Sensor With a Global Shutter and 12-bit Column-Parallel Cyclic A/D Converters, IEEE Journal of Solid-State Circuits, 2007. Volume: 42, Issue: 4. Page: 766 – 774.

[5.7] Chae, Y., A 2.1 M Pixels, 120 Frame/s CMOS Image Sensor with Column-Parallel sigma delta ADC Architecture. IEEE Journal of Solid-State Circuits, 2011. Volume: 46, Issue: 1. Page: 236 – 247.

[5.8] Lim, S., A High-Speed CMOS Image Sensor with Column-Parallel Two-Step Single-Slope ADCs. IEEE Transactions on Electron Devices, 2009. Volume: 56, Issue: 3. Page: 393–398.

[5.9] Snoeij, M., Multiple-Ramp Column-Parallel ADC Architectures for CMOS Image

Sensors, IEEE Journal of Solid State Circuits, 2007. Volume: 42, Issue: 12. Page: 2968 – 2977.

# **Chapter 6** Conclusion and Future Work

### 6.1 Conclusion

In this work a DLL-based Single Slope ADC is presented, which effectively increases the readout speed of the conventional Single Slope ADC. This ADC is designed for the column readout of CMOS image sensor with CDS implemented, and by just removing the CDS it can also be used in some other multi-channel applications.

A Delay Locked Loop (DLL) is designed which effectively eliminates the false locking problem. The phase frequency detector, the fully differential charge pump and the fully differential delay cell with two control terminals are implemented in this DLL. It shows a good jitter performance with low power consumption even under a noisy environment.

A 3-stage comparator with auto-zero technique is also designed, which has low offset and small delay variation. This comparator makes it possible to achieve a low FPN which is critical in CMOS image sensor design. The ripple counter and the cyclic thermometer code to binary code encoder are also designed in each column.

High priority is given to the readout speed in this design, so the analog CDS is a good candidate for the implementation of the CDS. However it is not chosen for some other considerations. The CDS architecture with two comparators and one XOR gate is used. Even though the power consumption and the FPN are doubled, the ADC still gives a satisfying performance.

This 12-bit ADC can achieve  $3\mu$ s readout time, and it can be fit into the area with the pixel pitch of 5.4 $\mu$ m. The performance is comparable to the state of art.

### 6.2 Future Work

Due to limited time and the plan of the EXEL group at imec, there is no tapeout plan for this design in the near future. In this work, results are verified with the post-layout simulations. Considering the design itself, an external ramp generator is used up to now, and to implement an on-chip ramp generator will make the system more compact and power efficient.

# Appendix

The layout of the DLL is illustrated in figure A-1, which occupies an area of  $200\mu$ m×60 $\mu$ m. The layout of the 330-column readout circuit is illustrated in figure A-2, and the area is 1.8mm ×1.05mm.



Figure A-1 Layout of the DLL

|  | omparator1    |  |
|--|---------------|--|
|  | omparator2    |  |
|  | XOR<br>Latch1 |  |
|  | Encoder1      |  |
|  | Latch2        |  |
|  | Encoder2      |  |
|  | Counter       |  |

Figure A-2 Layout of the column circuits