# **Analog Signal Processing for CMOS Image Sensors** Snoeij, MF **Publication date** **Document Version** Final published version **Citation (APA)**Snoeij, MF. (2007). *Analog Signal Processing for CMOS Image Sensors*. [Dissertation (TU Delft), Delft University of Technology]. Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. **Takedown policy**Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. # ANALOG SIGNAL PROCESSING FOR CMOS IMAGE SENSORS # ANALOG SIGNAL PROCESSING FOR CMOS IMAGE SENSORS ### **PROEFSCHRIFT** ter verkrijging van de graad van doctor aan de Technische Universiteit Delft op gezag van de Rector Magnificus Prof.dr.ir. J.T. Fokkema, voorzitter van het College voor Promoties, in het openbaar te verdedigen op dinsdag 4 september 2007 om 15:00 uur. door Martijn Fridus SNOEIJ elektrotechnisch ingenieur geboren te Zaanstad ## Dit proefschrift is goedgekeurd door de promotoren: Prof. dr. ir. A.J.P. Theuwissen Prof. dr. ir. J.H. Huijsing ### Samenstelling promotiecommissie: Rector Magnificus voorzitter Prof. dr. ir. A.J.P. Theuwissen Technische Universiteit Delft, promotor Prof. dr. ir. J.H. Huijsing Technische Universiteit Delft, promotor Prof. dr. ir. G.C.M. Meijer Technische Universiteit Delft Prof. ir. A.J.M. van Tuijl Universiteit Twente Prof. dr. B.J. Hosticka Fraunhofer IMS Duisburg, Duitsland Dr. J.E.D. Hurwitz Gigle Semiconductor, UK Dr. J. Solhusvik Micron Technology Inc., Noorwegen Reservelid: Prof. dr. P.J. French Technische Universiteit Delft Printed by PrintPartners Ipskamp, Enschede ISBN: 978-90-9022129-8 Copyright © 2007 by M.F. Snoeij All rights reserved. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the author. PRINTED IN THE NETHERLANDS # Table of Contents | 1. | Introduction | 1 | |----|---------------------------------------------------------------|----| | | 1.1 History of Electronic Image Sensors | 2 | | | 1.2 Challenges | 4 | | | 1.3 Motivation and Objectives | 8 | | | 1.4 Organization of the Thesis | 9 | | | 1.5 References | 10 | | 2. | <b>CMOS Imager Analog Signal Processing at a Glance</b> | 13 | | | 2.1 Architectural Overview of CMOS Image Sensors | 13 | | | 2.2 Photosensitive Elements | 16 | | | 2.2.1 Photodiodes | 16 | | | 2.2.2 Photogates | 20 | | | 2.2.3 Pinned Photodiodes | 22 | | | 2.3 Front-End Analog Signal Processing | | | | 2.3.1 Photodiode Front-End Readout Structure | 23 | | | 2.3.2 Photogate/Pinned Photodiode Front-End Readout Structure | 25 | | | 2.4 Back-End Analog Signal Processing | 27 | | | 2.4.1 Column Circuit Readout | 27 | | | 2.4.2 Chip-Level A/D Conversion | 29 | | | 2.5 Advanced Analog Signal Processing Techniques | 30 | | | 2.5.1 Sharing of Readout Circuitry Among Pixels | 30 | | | 2.5.2 kT/C Noise Reduction through Soft and Active Reset | 33 | | | 2.5.3 High Dynamic Range Readout | 34 | | | 2.5.4 Column-Level and Pixel-Level A/D Conversion | | | | 2.6 References | 37 | | 3. | Front-End Readout Circuitry | 41 | | | 3.1 Front-End Readout Circuit Performance | 42 | | | 3.1.1 Signal Swing | 42 | | | | | | | | 3.1.2 Linearity | . 45 | |----|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------| | | | 3.1.3 Fixed Pattern Noise | . 47 | | | | 3.1.4 Power Consumption | . 49 | | | 3.2 | Front-End Temporal Noise Sources | . 50 | | | | 3.2.1 Photon Shot Noise | . 50 | | | | 3.2.2 Reset Noise | . 52 | | | | 3.2.3 Thermal Noise | . 52 | | | | 3.2.4 1/f Noise | . 53 | | | | 3.2.5 Comparison of Noise Sources | . 56 | | | 3.3 | 1/f Noise Reduction Using Large-Signal Excitation (LSE) | . 57 | | | | 3.3.1 1/f Noise in Deep-Submicron MOS Transistors | . 57 | | | | 3.3.2 LF Noise Reduction using Large-Signal Excitation (LSE) | . 60 | | | | 3.3.3 Application of LSE inside a CMOS Imager Front-End | . 62 | | | 3.4 | LF Noise Measurements under Large Signal Excitation | . 64 | | | | 3.4.1 Measurement IC | . 64 | | | | 3.4.2 Measurement Results | . 66 | | | 3.5 | Conclusion | . 69 | | | 3.6 | References | . 70 | | | | | | | 4. | Co | olumn-Level Analog-to-Digital Conversion | 73 | | 4. | | 0 0 | | | 4. | | Why Column-Level A/D Conversion? | . 74 | | 4. | | Why Column-Level A/D Conversion? | . 74<br>. 74 | | 4. | 4.1 | Why Column-Level A/D Conversion? | . 74<br>. 74<br>. 78 | | 4. | 4.1 | Why Column-Level A/D Conversion? | . 74<br>. 74<br>. 78<br>. 83 | | 4. | 4.1 | Why Column-Level A/D Conversion? | . 74<br>. 74<br>. 78<br>. 83 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison Column-Level ADC Architectures 4.2.1 Column-Level ADC Architecture Requirements 4.2.2 Column-Parallel Single-Slope ADC Architecture | . 74<br>. 74<br>. 78<br>. 83<br>. 83 | | 4. | 4.1 | Why Column-Level A/D Conversion? | . 74<br>. 74<br>. 78<br>. 83<br>. 83<br>. 85 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison | . 74<br>. 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison | . 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87<br>. 92 | | 4. | 4.1 | Why Column-Level A/D Conversion? | . 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87<br>. 92<br>. 93 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison | . 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87<br>. 92<br>. 93 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison | . 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87<br>. 92<br>. 92 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison | . 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87<br>. 92<br>. 92 | | 4. | 4.1 | Why Column-Level A/D Conversion? 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion 4.1.2 Architectural Comparison | . 74<br>. 78<br>. 83<br>. 83<br>. 85<br>. 87<br>. 92<br>. 92<br>. 93 | | | 4.4.2 Dynamic Column Switching Simulations | | |-----------|---------------------------------------------------------|------------------| | 5. | A CMOS Imager with a Low-Power Colu<br>Single-Slope ADC | ımn-Level<br>115 | | | 5.1 Sensor Overview | 116 | | | 5.1.1 Design Goals | 116 | | | 5.1.2 System-Level Overview | 116 | | | 5.1.3 Column ADC Requirements | 119 | | | 5.2 Column Comparator Design | 120 | | | 5.2.1 Comparator Input Circuitry | 120 | | | 5.2.2 Comparator Topology | 120 | | | 5.2.3 Offset and Delay Compensation | 123 | | | 5.2.4 Preamp Design | 126 | | | 5.2.5 Regenerative Latch Design | 128 | | | 5.3 Dynamic Column Switching Circuitry | 130 | | | 5.4 Measurements | 133 | | | 5.4.1 Comparator Measurements | | | | 5.4.2 DCS Measurements | | | | 5.5 References | 141 | | <b>ó.</b> | e e e e e e e e e e e e e e e e e e e | 145 | | | Single-Slope ADC | 145 | | | 6.1 Sensor Overview | 145 | | | 6.1.1 Design Goals | 145 | | | 6.1.2 System-Level Overview | | | | 6.1.3 System-Level ADC Design Considerations | | | | 6.2 Column-Level Circuitry | | | | 6.3 Multiple Ramp Generator Design | | | | 6.3.1 Ramp Generator Concept | | | | 6.3.2 Resistor Ladder Switching Logic | | | | 6.3.3 Output Amplifier Offset Auto-Calibration | | | | 6.4 Measurement Results | | | | 6.4.1 Single-Slope Mode Measurements | | | | 6.4.2 MRSS Mode Measurements | 166 | | | 6.4.3 MRMS Mode Measurements | 169 | |----|-------------------------------------------------------------------------|-------------------------------| | | 6.5 References | 174 | | 7. | Conclusions | 175 | | | 7.1 Main Findings | 175 | | | 7.2 Future Work | 177 | | | 7.2.1 1/f Noise Reduction in CMOS Image Sensors | 177 | | | 7.2.2 Improvements to the MRSS/MRMS prototype | 178 | | | 7.2.3 Perceptual Effects of using a Companding ADC | 181 | | | 7.2.4 Perceptual Effects of Dynamic Column Switching | 181 | | | 7.3 References | 102 | | | Appendix A.Companding Quantization Calo | culation | | | | | | | Appendix A.Companding Quantization Calo | culation | | | Appendix A. Companding Quantization Calc<br>Method | culation<br>183 | | | Appendix A.Companding Quantization Calc<br>Method<br>Summary | culation<br>183<br>187 | | | Appendix A. Companding Quantization Calcondethod Summary Samenvatting | eulation<br>183<br>187<br>193 | Introduction "A picture is worth a thousand words." This simple truth has been known to mankind from the time of prehistoric cave drawings. Since then, techniques for creating images have continuously been refined over the ages, with the most recent step being the replacement of film-based cameras by digital cameras. The heart of a digital camera is an electronic device that converts optical information into electronic signals. Such an 'electronic eye' is called an image sensor or imager. These sensors have a large number of elements that convert light into electrical signals, which are subsequently processed by electronic readout circuits. The main goal of this thesis is to improve the quality of the image sensor by improving the readout circuitry. In this introductionary chapter, a short overview of the history of electronic image sensors is given. The first section provides a short historical overview of the two main types of sensors, which are the CMOS imager and the CCD imager. Next, the challenges in designing CMOS imagers are discussed. Based on these challenges, the motivation and goals of this thesis are presented. It is shown that in order to improve the quality of CMOS imagers, system-level changes to the read-out circuitry are necessary. Such a system-level approach forms the core of this thesis. Finally, the structure of the thesis is presented. # 1.1 History of Electronic Image Sensors The ability to electronically record images, transport them over long distances, and then instantly display them by means of the television is clearly one of the most important inventions of the twentieth century. The development of the television was partly made possible by the development of electronic devices that could process information in the form of electric signals. However, before information can be processed, it has to be acquired; therefore, a sensor that converts light into an electrical signal is necessary. The first practical electronic image sensor was the Vidicon or imaging tube. As their name implies, these early sensors were based on vacuum tube technology. The resulting cameras had the same drawbacks as vacuum tube radios: they were bulky, heavy, and consumed a lot of power (Figure 1-1a). With the invention of the transistor in 1947, a new class of *solid-state* electronic devices was born. The invention of the integrated circuit in 1958 by Jack Kilby [1.1] and Robert Noyce [1.2] was the decisive breakthrough for solid-state electronics. The ability to put a multitude of transistors together on a tiny silicon chip meant that more and more complex signal processing functions could be realized in a very small device. Not long after the demonstration of the first integrated circuit, several research groups realized that it was also possible to integrate light-sensitive elements onto a chip. The first publication of such an attempt was in 1963 by Morrison of Honeywell [1.3], followed Horton of IBM in 1964 [1.4] and Schuster of Westinghouse in 1966 [1.5]. All these early devices used the semiconductor processes available at the time to create image sensors, which were bipolar, NMOS or PMOS processes. The photosensitive elements used in these early imagers were photodiodes or phototransistors. Although some improvements were made throughout the 1960's, these early solid-state imagers exhibited two major problems that impeded their commercial use. First of all, the limited lithographic resolution available in the semiconductor processes of that time severely limited the resolution of the resulting imagers. Secondly, other technology-related limitations led to large non-uniformity between different pixels, a phenomenon usually called fixed-pattern noise (FPN). In 1969, a different solid-state imaging device, called the Bucket-Brigade Device, was invented by Sangster and Teer of Philips Figure 1-1: a) Assembly line of the first color TV camera, 1954 (courtesy of RCA) b) Single-chip camera modules with the same functionality, 2004 (courtesy of Philips Semiconductors) Research [1.6]. The original application for this device was an analog delay line, but the inventors soon realized it could also be used as an imager. In 1970, Boyle and Smith of Bell Labs made an improved device which they called a Charge-Coupled Device or CCD. This name has become nearly synonymous with a solid-state imager, although, in theory, a CCD can be used for many different applications. Compared to the early imagers that were made in MOS or bipolar processes, the CCD had the advantage of being a relatively simple device, making it easier to realize an imager with a sufficiently high resolution on a chip. Moreover, CCDs were relatively free of FPN. Despite these advantages, it took more than a decade before the first commercial CCD imager came on the market, mainly because of fabrication and reliability problems. The first major application of CCDs was in consumer video cameras, where their smaller size and power consumption, compared to imaging tubes, were key advantages. After application in consumer application, CCDs were quickly adopted in the professional TV broadcasting scene, and the classical imaging tube disappeared completely. The success of the CCD imager led to a near abandonment of research into MOS-based image sensors. In the early 1990's however, several groups led a resurgence in MOS imager research and development. Among these groups were the University of Edinburgh, Linköping University in Sweden and NASA's Jet Propulsion Laboratory (JPL). The motivation for this research was that while CCDs had excellent performance, the specialized semiconductor process with which CCDs are fabricated made it very difficult to co-integrate large circuit blocks onto the same chip. Therefore, in order to create a complete camera system, at least two chips were necessary. However, if it were possible to realize an imager in a standard CMOS process, the signal processing could be integrated on a single chip, creating a camera-on-a-chip. Apart from the obvious benefit of creating smaller cameras, such miniaturization could also lead to lower cost and lower power consumption. In the late 1990's, mobile telephony found very rapid adoption among consumers, creating a new high-volume market for portable electronic devices, where low power consumption and small system sizes are key requirements. Around 2000, the first mobile phones equipped with cameras became available. For this application, CMOS imagers are very well suited. Firstly, their power consumption is much lower compared to CCDs. Secondly, the complete camera can have smaller physical dimensions because the signal processing can be integrated on the same chip as the sensor (Figure 1-1b). This high-volume market has fuelled the rapid development of CMOS imagers. Today, CMOS imaging is emerging as a mature technology alongside CCDs. Camera-equipped cell phones have more or less become a standard. The focus of CMOS imager development for this application is now on improving (perceived) image quality, and in particular, increasing the pixel count. # 1.2 Challenges Having discussed the history of CMOS imagers in the previous section, this section will take a brief look into the future. In particular, the challenges in designing future imagers will be discussed. Such a design is typically a system effort, where it is not possible to identify a single performance constraint or physical limit. Instead, a set of constraints, comprising both physical limits as well as customer requirements, has to be met. Since it is difficult to mathematically define the relation between these design constraints, no widely accepted figure-of-merit has been defined for CMOS imagers. However, it is possible to identify a number of parameters that are defining CMOS imager performance. Three of these parameters have a significant impact on the requirements for the analog readout circuitry: - Signal-to-noise ratio and dynamic range - Number of pixels / 'resolution' - Power consumption In the following sections, each of these performance parameters will be discussed. ### Signal-to-Noise Ratio and Dynamic Range In many sensor interface systems, the maximum signal-to-noise ratio and dynamic range are nearly equal to each other. This is because the amount of noise in many systems can be considered constant, and therefore, the dynamic range, i.e. the ratio of the maximum over the minimum signal that the system can process, becomes equal to the maximum signal over the noise (since the noise limits the minimum signal that can be processed). However, the amount of noise in an image sensor is signal dependent because of the presence of photon shot noise, as will be explained in section 3.2.1. This noise source typically dominates at higher input signals, and therefore, the maximum signal-to-noise ratio will be less than the dynamic range. In order to increase overall image quality, it is desirable to increase both dynamic range and signal-to-noise ratio. A lot of work has been done on increasing the dynamic range by increasing the maximum amount of input signal an imager can handle, for instance by using pixels with a logarithmic response [1.12], or using multiple capture [1.13]. A brief overview of these techniques will be given in sub-section 2.5.3. However, all these techniques only increase the dynamic range at the expense of signal-to-noise ratio. Moreover, other problems (fixed-pattern noise, extra in-pixel circuitry) make the adoption of these techniques in mainstream applications unattractive. Therefore, it would be more beneficial if the dynamic range can be increased by reducing the noise in the imager signal as much as possible. The amount of noise on the imager's output signal depends on a number of noise sources. Some of these are fundamental in nature (such as photon shot noise), others are technology dependent (such as dark current), and yet others are circuit related (thermal noise, 1/f noise). An excellent performance analysis paper can be found in [1.14], where it is shown that the amount of circuit noise actually exceeds the technology related noise sources. However, at the start of this thesis work, it was unclear *which* circuit noise source was dominant, and if such noise could be reduced using circuit techniques. #### Number of Pixels / 'Resolution' The number of (mega)pixels an imager has is perhaps one of its most 'visible' performance parameters. Quite often, this number is assumed to be synonymous with the imager's resolution, i.e. its ability to resolve light variations in the spatial domain. However, this assumption is incorrect; apart from the amount of pixels, two other parameters are of key importance to the resolution of CMOS imagers. First of all, some of the charge carriers generated in the silicon due to incident light can diffuse from underneath one pixel to the other. The smaller the pixel size, the worse this effect becomes. Second, the optical system in front of the imager will also have a limited spatial resolution. Therefore, instead of the number of pixels, the so-called Modulation Transfer Function (MTF) is a correct measure of imager resolution. More information on how to determine MTF can, for instance, be found in [1.7]. In spite of being an incorrect performance parameter, the number of pixels is universally marketed as the sole performance parameter for imagers, in particular in consumer applications. As a result, customers are nowadays convinced that an 8 megapixel camera is 4 times better than a 2 megapixel camera. This market force has led to an interesting situation in imaging design, in particular for low-cost sensors for mobile applications. In order to increase the number of pixels, either the chip size has to increase, or the pixel size should shrink. It is very unattractive to increase the chip size, not only because a larger chip is more expensive, but also since it would require a larger optical format, and therefore, the camera as a whole would be larger, which is not acceptable in a mobile application. Therefore, shrinking the pixel size is the only way to produce an imager with a higher pixel count at the same cost. In his 1997 overview paper [1.8], Fossum predicted that pixel size would stabilize in the year 2000 at about 5µm, due to practical limitations of the optics (Figure 1-2a). Since the minimum process feature size would continue to shrink after 2000, this would mean that more transistors could be integrated on the same pixel for added functionality. However, rather the opposite happened, as is illustrated in Figure 1-2b. For this graph, an average pixel size of imagers published at the International Solid-State Circuits Conference (ISSCC) and the International Electron Devices Meeting (IEDM) was computed for each year between 1998 and 2007. It is clear that pixel sizes have continued to drop well below 5µm. In order to enable this decrease in pixel size, the number of transistors was actually reduced, by sharing read-out transistors between several pixels [1.9-1.10]. Using this transistor sharing method, pixel sizes as small as 1.75µm have recently been reported [1.11]. Although the optics of cameras in mobile applications have improved, it is still doubtful if the use of such small pixels will increase the imager performance further. Nevertheless, the marketing forces that drive increasing pixel count have not changed, and therefore, it can be expected that the pixel count will keep increasing as long as it does not result in significantly *lower* imager performance. ### **Power consumption** Figure 1-2: a) Graph taken from Fossum [1.8], predicting a minimum pixel size of 5µm in 2000 b) Graph showing the average of published pixel sizes for each year from 1998-2007 Although the power consumption of the first generation of CMOS image sensors was already about an order of magnitude lower than CCDs, the recent focus on mobile, battery-powered applications has provided a strong motivation to further decrease power consumption. In a CMOS imager, the pixel array itself has a very low power consumption compared to that in a CCD imager, as there are no large CCD gates to charge and discharge during the readout process. In conventional CMOS imagers, most of the power is therefore consumed by the readout circuitry, in particular the analog-to-digital converter and digital circuitry [1.15]. # 1.3 Motivation and Objectives As shown in the last section, the challenges in designing CMOS image sensors involve the improvement of three key performance parameters: the number of pixels, the signal-to-noise ratio and the power consumption. In improving each of these parameters, analog signal processing plays an important role. Although the signal-to-noise ratio of a CMOS imager is partly defined by the properties of the light sensitive element, it is usually the front-end analog circuit that determines the noise floor of the image sensor. Therefore, any noise reduction in the analog readout circuit would directly lead to a better CMOS imager. While this challenge might seem simple, the fact that analog readout circuit has to be partially implemented inside the pixel itself leads to severe design constraints, as the amount of available chip area is minimal. While the increase in pixel count does not directly require an improvement of the analog readout circuit, it does have an important indirect impact. A higher pixel count requires an increase in the bandwidth of the signal processing chain, since more pixels need to be read out in the same amount of time. This higher bandwidth requirement can have two negative effects on the performance of the imager. Firstly, it can increase the total amount of noise in the analog signal processing chain. Since the CMOS imager readout structure requires the sampling of data, the total amount of in-band noise usually determines the noise performance. Therefore, if the signal bandwidth increases, it typically results in a higher total in-band noise, unless the noise density of the circuit can be lowered, which requires more power. A second consequence of the higher pixel count is that, if no system-level changes are made to the analog readout circuit, its power consumption will have to be increased in order to increase the bandwidth. This is very undesirable, since, as mentioned, a third challenge in CMOS imager design is actually to *lower* power consumption. Therefore, because of the requirement, on the one hand, to increase pixel count and thus signal bandwidth, and, on the other hand, the requirement to lower power consumption, system-level improvements to the analog readout circuit are imperative. Such improvements should lead to a better power efficiency of the circuit, i.e. less power consumption per readout operation. As mentioned in the previous section, the A/D converter is consuming most of the power in the analog readout circuit, and therefore, efforts to increase the power efficiency of the analog signal processing chain should be focused on the ADC. In conclusion, the focus of this thesis can be summarized in two goals: - Reduce the noise of the analog readout circuit as much as possible to increase the overall noise performance of a CMOS imager. - Significantly improve the power efficiency of the analog signal processing chain as much as possible, in order to enable low-power high-resolution CMOS imagers. # 1.4 Organization of the Thesis The remainder of this thesis consists of six chapters. Chapter 2 provides an overview of the analog signal processing chain in conventional, commercially-available CMOS imagers. First of all, the different photo-sensitive elements that form the input to the analog signal chain are briefly discussed. This is followed by a discussion of the analog signal processing chain itself, which will be divided into two parts. Firstly, the analog front-end, consisting of in-pixel circuitry and column-level circuitry, is discussed. Second, the analog back-end, consisting of variable gain amplification and A/D conversion is discussed. Finally, a brief overview of advanced readout circuit techniques is provided. In chapter 3, the performance of the analog front-end is analyzed in detail. It is shown that its noise performance is the most important parameter of the front-end. An overview of front-end noise sources is given and their relative importance is discussed. It will be shown that 1/f noise is the limiting noise source in current CMOS imagers. A relatively unknown 1/f noise reduction technique, called switched-biasing or large signal excitation (LSE), is introduced and its applicability to CMOS imagers is explored. Measurement results on this 1/f noise reduction technique are presented. Finally, at the end of the chapter, a preliminary conclusion on CMOS imager noise performance is presented. The main function of the back-end analog signal chain is analog-to-digital conversion, which is described in chapter 4. First of all, the conventional approach of a single chip-level ADC is compared to a massively-parallel, column-level ADC, and the advantages of the latter will be shown. Next, the existing column-level ADC architectures will be briefly discussed, in particular the column-parallel single-slope ADC. Furthermore, a new architecture, the multiple-ramp single-slope ADC will be proposed. Finally, two circuit techniques are introduced that can improve ADC performance. Firstly, it will be shown that the presence of photon shot noise in an imager can be used to significantly decrease ADC power consumption. Secondly, an column FPN reduction technique, called Dynamic Column Switching (DCS) is introduced. Chapter 5 and 6 present two realisations of imagers with column-level ADCs. In chapter 5, a CMOS imager with single-slope ADC is presented that consumes only 3.2µW per column. The circuit details of the comparator achieving this low power consumption are described, as well as the digital column circuitry. The ADC uses the dynamic column switching technique introduced in chapter 4 to reduce the perceptional effects of column FPN. Chapter 6 presents an imager with a multiple-ramp single-slope architecture, which was proposed in chapter 4. The column comparator used in this design is taken from a commercially available CMOS imager. The multiple ramps are generated on chip with a low power ladder DAC structure. The ADC uses an auto-calibration scheme to compensate for offset and delay of the ramp drivers. Finally, chapter 7 presents the main conclusions of this thesis and gives suggestions for future work. ## 1.5 References - [1.1] J.S Kilby, US Patent no. 3,138,743, issued June 1964 - [1.2] R.N. Noyce, US Patent no. 2,981,877, issued April 1961 - [1.3] S. Morrison, "A new type of photosensitive junction device," *Solid-State Electron.*, vol. 5, pp. 485–494, 1963. - [1.4] J. Horton, R. Mazza, and H. Dym, "The scanistor—A solid-state image scanner," in *Proc. IEEE*, vol. 52, pp. 1513–1528, 1964. - [1.5] M. A. Schuster and G. Strull, "A monolithic mosaic of photon sensors for solid state imaging applications," *IEEE Trans. Electron Devices*, vol. ED-13, pp. 907–912, 1966. - [1.6] F.L.J Sangster, K. Teer, "Bucket-Brigade electronics", *IEEE Journal of Solid-State Circuits*, Vol. SC-4, pp. 131, 1969. - [1.7] A.J.P. Theuwissen, *Solid-State Imaging with Charge-Coupled Devices*, Kluwer Academic Publishers, Dordrecht, 1995. - [1.8] E.R. Fossum, "CMOS Image Sensors: Electronic Camera-On-A-Chip", *IEEE Transactions on Electron Devices*, Vol. 44, No. 10, pp. 1689-1698, Oct. 1997 - [1.9] H. Takahashi et al., "A 3.9µm Pixel Pitch VGA Format 10b Digital Image Sensor with 1.5-Transistor/Pixel", *IEEE International Solid-State Circuits Conference*, Vol. XLVII, pp. 108-109, Feb. 2004 - [1.10] M. Mori et al., "A 1/4in 2M Pixel CMOS Image Sensor with 1.75Transistor/Pixel", *IEEE International Solid-State Circuits Conference*, Vol. XLVII, pp. 110-111, Feb. 2004 - [1.11] K-B. Cho et al., "A 1/2.5 inch 8.1Mpixel CMOS Image Sensorfor Digital Cameras", *IEEE International Solid-State Circuits Conference*, Vol. L, pp. 508-509, Feb. 2007 - [1.12] S.J. Decker, R.D. McGrath, K. Brehmer and C.G. Sodini, "A 256x256 CMOS imaging array with wide dynamic range pixels and column-parallel digital output", *IEEE Journal of* Solid-State Circuits, Vol. 33, pp. 2081-2091, Dec. 1998 - [1.13] O. Yadid-Pecht and E. Fossum, "Wide intrascene dynamic range CMOS APS using dual sampling", *IEEE Transactions on Electron Devices*, vol. 44, pp. 1721-1723, Oct. 1997 - [1.14] A.J. Blanksby and J. Loinaz, "Performance Analysis of a Color CMOS Photogate Image Sensor", *IEEE Transactions on Electron Devices*, Vol. 47, No. 1, Jan. 2000 - [1.15] B. Pain, Course on CMOS Digital Image Sensors, *SPIE Photonics West*, San Jose, 2001 ### Introduction # CMOS Imager Analog Signal Processing at a Glance 2 This chapter gives an overview of the analog signal processing chain of a CMOS image sensor. It follows the signal path from input to output or 'from photons to bits'. The purpose of this chapter is to introduce the topic to the analog circuit designer who is not familiar with CMOS imagers. As such, it does not intend to give a complete overview of all the readout structures that have been published over the years, but rather introduces the reader to a typical structure as a basis for the rest of the thesis. The chapter starts with a brief architectural overview of a typical image sensor in section 2.1. Next, a typical analog signal processing chain is described from input to output. In section 2.2, the photosensitive elements are discussed. Section 2.3 details the function of the front-end readout circuitry, while section 2.4 describes the back-end readout circuitry. Finally, section 2.5 provides a brief overview of improvements and or alternative readout structures that have been proposed in recent years. # 2.1 Architectural Overview of CMOS Image Sensors CMOS image sensors are possibly one of the most complex mixed-signal integrated circuits on the market today. They routinely contain several million transistors, and have a large quantity of both analog and digital circuitry. While there are a large number of variations possible in terms of resolution, frame-rate, readout features, etc., most analog signal processing chains follow a similar architecture, as is depicted in Figure 2-1. In this simplified block diagram, only analog circuit blocks are shown, while digital driver/control blocks are omitted for clarity. As can be seen in the figure, the analog signal processing chain can be divided into five main blocks. The first block consists of the photosensitive pixel array itself. Apart from photosensitive elements, this block also contains some analog readout circuits implemented into each pixel. The second block consists of a set of column circuits that are located outside the pixel array. As its name implies, each column circuit is connected to a single column of the pixel array. The combination of in-pixel circuitry and column circuitry concurrently reads out a row of the pixel array. To this end, the row decoder outputs a control signal, which ensures that a single row of the pixel array is connected to the column circuits. The results of this readout operation are stored on capacitors in each column circuit. Figure 2-1: Block diagram of the analog signal processing chain in a CMOS image sensor The third block is the chip level circuit. Here, "chip-level" implies that only a single circuit is used to read out all signals of the chip, rather than having a row of identical circuits (the column circuits) or an array of identical circuits (the pixel array). The chip-level circuit is consecutively connected to each column circuit and reads out the result of the front-end readout operation stored in the column circuit. To this end, the column decoder outputs a control signal, which connects one column circuit to the chip level circuit. In the latter, an A/D converter digitizes the results, after which further digital processing can be performed. While the analog readout system is physically separated into three blocks, the actual readout operation is a two-step process, as described above: First, the concurrent readout of a row of pixels and storage in the column circuits, and second, a consecutive readout of the column circuits. Since such a division into two parts is more convenient to describe the analog circuitry, it will be used throughout this thesis. To this end, the circuitry that performs the first readout operation will be called the *front-end readout circuitry*; it consists of both in-pixel electronics and a part of the column circuits up to the capacitors that store the results of the first readout. The circuitry that performs the second readout operation will Figure 2-2: Typical timing diagram of the readout of a CMOS imager with a resolution of m x n pixels be called the *backend readout circuitry*; it consists of remaining part of the column circuitry and the chip level circuitry. Although CMOS image sensors can theoretically have a random-access readout mode, in which each pixel can be individually accessed, in most cases, a full image or frame is read out serially, as is illustrated in Figure 2-2. This can be done at a typical rate of about 30-50 frames/second, which means that the *frame time* is about 33-20ms. As described above, the imager is read out on a row-by-row basis; to this end, each frame time is divided into a number of line times. The amount of time available to read out a row of the imager equals the frame time divided by the number of rows. For a moderate resolution of about 500 rows or *lines*, this results in a *line time* of about 40-60µs. During each line time, the two distinct readout steps are performed, as depicted in Figure 2-2. First, a front-end readout operation is performed, storing the outputs of a row of pixels into the column circuits. This operation can usually be performed within 3-5µs. The rest of the line time is used for the back-end readout. During this period, the chip-level circuit reads out the column circuits one by one. Assuming a moderate resolution of about 500 columns, the readout of each column circuit should be done in about 100ns. The chip-level circuit therefore needs to be able to operate at about 10MHz. This frequency is usually called the *pixel clock*, and is often also the clock frequency at which most of the imager operates. ### 2.2 Photosensitive Elements As with all interface electronics, knowledge of the sensor itself is vital to be able to design a read-out front-end. Therefore, a brief overview of the photosensitive elements used in CMOS imagers will be given, aimed at explaining the requirements on the readout circuitry. The basic concepts will be explained for the simplest photosensitive device, the photodiode. After this, the added functionality provided by the photogate and pinned photodiode will be explained. #### 2.2.1 Photodiodes The basis of solid-state imaging is the photo-electric effect [2.1], which describes the interaction of electromagnetic radiation with matter. In the case of a solid-state imager, the electromagnetic radiation will be Figure 2-3: *Generation of photocurrent in a p-n junction* visible light and the matter will be a semiconductor. When a semiconductor is exposed to light, the incident photons can transfer part of their energy to individual silicon atoms, resulting in the generation of electron-hole pairs. The main condition is that the wavelength of the light, and thus the energy of the photons, should be higher than the bandgap of the semiconductor, as the photon should have enough energy to lift an electron from the valence band into the conduction band. Luckily for solid-state imaging, the most widely used semiconductor material, silicon, has a bandgap low enough (1.1eV) to allow visible light to generate electron-hole pairs. In order to detect the generated electron-hole pairs, the next step is to quickly separate the electrons from the holes, which would otherwise recombine within a short time. The simplest mechanism for separation is the electric field present inside the depletion region of a p-n junction of a diode (Figure 2-3). The electric field will cause the electrons to drift towards the n-doped silicon, while the holes drift towards the p-doped region. This results in a current across the p-n junction which flows in the reverse direction of the diode. In conclusion, a photodiode is an ordinary p-n junction that is exposed to light; this incident light results in a reverse current, often called photocurrent, through the diode. While some of the earliest solid-state imagers attempted to measure the photocurrent directly, all modern solid-state imagers work in integrating mode [2.2]: the photocurrent is integrated onto a capacitance, and the voltage change across the capacitance is read out. There are two reasons for this. First, in a typical imager, the number of generated electron-hole pairs will be very small, resulting in a current of less than 1pA. It is very difficult to design simple interface electronics that can accurately measure such a small current. Second, an imager needs an array of photodiodes, which all have to be read out by analog circuitry. It is quite difficult to read out all these photodiodes concurrently; instead, read-out is usually done on a row-by-row basis, as will be shown later. This implies that the readout circuit has to be time-shared among the photodiodes, and as a result, each photodiode has to be read out in a short time. Therefore, reading out the integrated photocurrent is easier, as the energy stored in the integrating capacitor is larger than the instantaneous energy generated by the photodiode. Moreover, in photography applications where a flash gun is used, a direct readout of photocurrent would imply that all pixels have to be read out during the 'flash', which is impractical. Integrating the photocurrent can be done by using the photodiode's own capacitance. When the diode is reverse-biased, the p and n regions effectively function as the isolated plates of a capacitor. The photodiode can therefore be operated as follows (Figure 2-4). First, a voltage is applied to reverse bias the diode using the reset switch shown in the figure. This reset operation effectively samples the voltage $V_{bias}$ onto the parasitic capacitance of the diode, and therefore, the diode will stay in reverse bias when the external voltage source is removed. After the reset switch is opened at $t_0$ , the biasing voltage will decrease if the diode is Figure 2-4: a) Schematic of a photodiode in integrating mode. b) Plot of the voltage on the photodiode vs. time as photocurrent is integrated. exposed to light, as this generates a photocurrent that is integrated onto the capacitor. Since the photocurrent is directly proportional to the amount of light, the resulting voltage decrease across the photodiode is, to first order, directly proportional to the amount of light and to the integration time. Therefore, by measuring the voltage over the diode after a certain integration time a measure of light intensity is acquired. A number of noise sources limit the dynamic range and signal-to-noise ratio of a photodiode. While all of them will be discussed in chapter 3, one dominant noise source will be described here to explain the need for more complex photosensitive elements. As explained, the reset operation effectively samples the voltage $V_{bias}$ onto the photodiode capacitance $C_{pd}$ . Just as any other switched-capacitor circuit, this sampling operation exhibits sampling noise. As is well known, this sampling noise equals: $$\overline{v_n} = \sqrt{\frac{kT}{C_{pd}}}$$ (2-1) While this noise source is usually referred to as kT/C noise in the analog circuit design community, in the CMOS imager literature, it is mostly referred to as reset noise, as the noise is introduced onto the photodiode when it is reset. In image sensor design, in order to allow for comparisons between imagers, all noise sources are referred to the physical input of the sensor, which is the charge stored in the photodiode capacitance, usually expressed in a number of electrons. This charge is related to the voltage over the photodiode as follows: $$v_{pd} = \frac{q}{C_{pd}} \cdot e_{pd} \tag{2-2}$$ where $v_{pd}$ is the voltage over the photodiode, $e_{pd}$ number of electrons stored into the photodiode capacitance, q the charge of an electron, and $C_{pd}$ the capacity of the photodiode. The ratio of q and $C_{pd}$ is usually called the *conversion gain*, since it determines the 'gain' of the charge-to-voltage conversion that effectively takes place at the photodiode capacitance. By combining Eq. (2-1) with Eq. (2-2), the sampling noise can be expressed as a number of noise electrons rms $\overline{e_n}$ : $$\overline{e_n} = \frac{\sqrt{kTC}}{q} \tag{2-3}$$ At first glance, this can look paradoxical to a circuit designer, as it seems now that, instead of decreasing, the sampling noise is increasing with the capacitance. However, the key insight is that the capacitance not only determines the noise level expressed in terms of charge, but also determines how this noise charge is converted into a noise voltage according to Eq. (2-2). As the charge-to-voltage conversion is inversely proportional to the capacitance, while the noise expressed in charge is only proportional to the square root of the capacitance, the noise voltage decreases with the square root of the capacitance. However, as in any other analog circuit, the choice of capacitance is a trade-off between noise performance and other parameters. In CMOS imagers, the required (small) pixel size usually constitutes an upper limit to the capacitance. In a typical imager, the photodiode capacitance is in the order of 1-10fF, leading to a noise voltage of 1-3mV rms (or 18-40 electrons rms). This is usually the dominant noise source, which can considerably limit the dynamic range of the photodiode-based imager. There is however a conceptual solution for the reset noise. By sampling the voltage across the photodiode immediately after it is reset, the reset noise can be measured. Next, photocurrent is integrated onto the photodiode's capacitance for certain period, after which the voltage across the photodiode is sampled again. This second sample then contains the signal voltage (i.e. the decrease in photodiode voltage that is proportional to light) and the reset voltage. By subtracting the first sample from the second, the reset noise is removed from the second sample. However, this solution is not practical for most imagers, as each pixel's reset voltage would have to be sampled before integrating its photocurrent, and this sample would need to be stored until after the integration period. Therefore, an analog or digital memory would be required that can store a full frame, which would consume a very large amount of chip area. To solve this problem, alternative photosensitive elements have been developed, which can read out the reset noise after the integration of photocurrent is completed. ## 2.2.2 Photogates The problem of reset noise, as described in the previous section, can be solved by using a photogate as photosensitive element [2.3][2.10]. Figure 2-5 shows a cross-section of such a device. In a photogate, the electrical field that separates photon-generated electron-hole pairs is Figure 2-5: Cross-section of a photogate including floating diffusion read-out. established by biasing the photogate at a positive voltage relative to the substrate. As a result, photon-generated electrons are attracted towards the photogate during charge integration, while the holes are pushed away. This creates a pocket of negative charge underneath the photogate. This charge is read out using a separate structure, consisting of a transfer gate and a so-called floating diffusion that are connected to the photogate. The readout operation is performed as follows: Firstly, the floating diffusion is reset to a biasing voltage. As with the photodiode, the floating diffusion can be considered to be a capacitance onto which a voltage is sampled, and therefore, reset noise is generated. This reset noise is sampled by the readout circuit for compensation. Next, the photon-generated charge is transferred from underneath the photogate into the floating diffusion by pulsing the photogate. This transfer of charge is very similar to what is done in a charge-coupled device (CCD) and can be done in a fast ( $< 2\mu s$ ) and nearly lossless fashion. As a result, the voltage across the floating diffusion is proportional to the amount of photon-generated charge plus the amount of sampling noise. By sampling this value and subtracting the first sample containing only reset noise from it, an output value can be acquired that is free from reset noise. In conclusion, the photogate solves the reset noise problem because it has a floating diffusion capacitance onto which the photon-generated charge can be transported *quickly*. As a result, the reset voltage and the signal voltage on the floating diffusion capacitance can be sampled in quick succession, and therefore, no frame memory is required as would be the case with a photodiode readout. However, photogates have one distinct disadvantage. The presence of a gate on top of the photosensitive silicon significantly decreases the light sensitivity of the device. #### 2.2.3 Pinned Photodiodes The above-mentioned problem of photogates, i.e. their decreased light sensitivity compared to photodiodes, was solved with the development of the pinned photodiode [2.4-2.5]. Figure 2-6 depicts a cross section of a pinned photodiode. Compared to a normal photodiode, a very shallow p+ layer has been implanted near the silicon surface, thereby connecting (i.e. "pinning") the cathode of the photodiode to the substrate. The resulting structure is read out in the same way as a photogate, by transferring the photon-generated charge from the pinned photodiode to the floating diffusion. As is obvious from the figure, the pinned photodiode solves the photogate's problem of lower optical sensitivity. The pinned photodiode has some other advantages over both photodiodes and photogates, in particular a lower dark current. The main drawback is that it is more difficult to fabricate. In a pinned photodiode, the depletion region from the n-/p substrate junction should extend into the depletion layer of the p+/njunction in order for the device to work properly, i.e. the n- region must be fully depleted. In order for this to happen, both the p+ and n- doping levels Figure 2-6: Cross-section of a pinned photodiode with floating diffusion readout structure have to be accurately controlled. In spite of this process control difficulty, the pinned photodiode has become the most popular photosensitive elements for high-quality CMOS imagers [2.6]. # 2.3 Front-End Analog Signal Processing In this section, the front-end part of the analog signal processing chain will be discussed. This front-end consists of the in-pixel circuitry, as well as a set of column circuits that are implemented outside the pixel array. The function of the front-end is to read out the voltage generated by the photosensitive element used in the pixel, and store this output voltage in the column, where it can be read out by the analog back-end. This process will first be explained for a typical front-end circuit used for a photodiode. After that, the front-end for a photogate or pinned photodiode will be discussed. #### 2.3.1 Photodiode Front-End Readout Structure Figure 2-7a shows a circuit diagram of a typical analog front-end for a photodiode [2.7-2.9]. Here, a single pixel from the pixel array and a single column circuit from the row of column circuits are depicted. The pixel uses a photodiode as described in sub-section 2.2.1. Transistor M1 resets the photodiode, and precharges it to $V_{\rm pixel}$ . After this reset, any light on the pixel will generate a current in the photodiode that will decrease its precharged voltage, thus integrating the current. At the end of each integration period, the voltage decrease is read out and the photodiode is again reset to $V_{\rm pixel}$ , as indicated in Figure 2-7b. In Figure 2-7c, the timing of the readout operation is shown in more detail. For this readout, two transistors M2 and M3 are integrated into the pixel. Because three transistors are used, a pixel with a photodiode is often called a *3T pixel*. Transistor M3 is used as a switch that connects the pixel circuit to the column circuit via control line *row select*. This control line connects not one, but a full row of pixels to the set of column circuits, as the readout is performed on a row-by-row basis. When a pixel is connected to a column circuit, transistor M2 inside the pixel is biased with current source I<sub>b</sub> inside the column circuit and functions as a source follower. The resulting single transistor amplifier outputs the voltage across the photodiode onto the column bus with a gain close to unity. Figure 2-7: a) Analog front-end circuit of a CMOS Imager using a photodiode b) Timing diagram of the integration of photocurrent c) Detailed timing diagram of the front-end readout operation An important problem of the front-end is that transistor M2 has to be small enough to fit inside a pixel, which means that its parameters will spread a lot, resulting in a large pixel-to-pixel mismatch. If uncorrected, such mismatch would lead to large offsets that would be visible in the image. Moreover, transistor M2 will also have a relatively high 1/f noise because of its small size. To correct for this problem, a double sampling is applied, which is implemented using capacitors C1 and C2 and switches S1 and S2 located inside the column circuit. Firstly, the light-dependent photodiode voltage is sampled using C1 and S1. This voltage contains both signal and offset and 1/f noise. Next, the photodiode is reset using transistor M1 and the resulting reset voltage is sampled onto capacitor C2 using switch S2. This reset voltage contains the offset and 1/f noise of the transistor; therefore by subtracting this sample from the signal sample, the offset and 1/f noise is cancelled out. It is crucial to understand that the above-described double sampling that corrects for offset and 1/f noise does not correct for kT/C noise generated when the pixel is reset. As discussed in sub-section 2.2.1, each photodiode reset samples kT/C noise onto the photodiode capacitance. Therefore, in our example, both the signal and the reset sample contain reset noise, and therefore it is often assumed that the subtraction of these samples cancels the reset noise. However, the subtraction of the samples does not cancel reset noise, as the reset noise in the two samples is not correlated. This can be understood by realizing that a reset operation is performed between the two sampling instances. With this reset operation, a new reset noise sample is taken, and therefore, the second sample has a different reset noise sample from the first sample. As a result, uncorrelated kT/C noise is subtracted, which actually leads to an increase of this reset noise with a square root of two. In order to distinguish the double-sampling operation described in a 3T pixel structure described above from a 'true' correlated double-sampling, where reset noise is compensated as well, the double sampling operation is usually called double-delta sampling (DDS). # 2.3.2 Photogate/Pinned Photodiode Front-End Readout Structure The front-end readout of imagers equipped with photogates [2.10-2.12] or pinned photodiodes is very similar to the readout operation Figure 2-8: a) Analog front-end circuit of a CMOS imager using a pinned photodiode. b) corresponding timing diagram of the readout operation described above. Figure 2-8a depicts the front-end circuit with a pinned photodiode and Figure 2-8b shows the corresponding timing diagram. As can be seen from the figure, the readout circuit itself is identical to the photodiode readout. Because of the addition of the transfer gate, photogate or pinned photodiode pixels are often called a *4T pixels*. The difference with a 3T pixel front-end is in the timing of the readout. As explained in sub-section 2.2.2, the floating diffusion is reset immediately before the readout operation. In practice, it is advantageous to keep resetting the floating diffusion when it is not read out, as illustrated in the figure. This continuous reset can prevent an artefact called *blooming* when the sensor is exposed to a large amount of light. After the reset signal is made low, a first sample is taken using switch S1 and capacitor C1. This sample contains the kT/C noise generated with the reset of the floating diffusion, as well as offset and 1/f noise of the source follower transistor M2. Next, the transfer gate is pulsed, which quickly transfers the integrated photo-charge from the pinned photodiode to the floating diffusion. After this transfer is complete, a second sample is taken using switch S2 and capacitor C2. The second sample contains the signal, plus kT/C noise from the floating diffusion as well as offset and 1/f noise from M2. As was explained in sub-section 2.2.2, the crucial advantage of photogates and pinned photodiode is the quick transfer of charge from the photosensitive element itself onto the floating diffusion. This allows the reset of the floating diffusion to be performed before the reading out the first of the two samples. As a result, the kT/C noise generated with the reset is correlated between the two subsequent readout samples, and is therefore cancelled together with the offset and 1/f noise of source follower M2. This results in a significantly lower noise level compared to a readout operation with a 3T pixel structure. Nonetheless, there are several noise sources and other non-idealities in a 4T pixel front-end that limit the performance of the sensor. These front-end performance limitation will be discussed in detail in chapter 3. # 2.4 Back-End Analog Signal Processing As described in the architectural overview of section 2.1, the function of the back-end of the analog signal processing chain is to read out the sampled signals inside the column and convert them in the digital domain. In this section, both sub functions will be discussed. In sub-section 2.4.1, the analog readout of the column will be detailed. Subsequently, the A/D conversion will be briefly discussed in sub-section 2.4.2. #### 2.4.1 Column Circuit Readout As described in section 2.3, the front-end readout circuit reads-out the pixel on a row-by row basis, reading out two samples per pixel that are stored on capacitors in each column circuit. These sampled voltages have to be read-out from the column circuits and subtracted from each other to cancel offset, 1/f noise, and (when a 4T pixel is used) reset noise. Figure 2-9a depicts a simplified block diagram of the column and chip-level circuits. Each column circuit is consecutively connected to a common two-wire analog bus that connects it to the chip-level circuit. As discussed in section 2.2, the column decoder (not shown in the figure for clarity) outputs control signals to this end. In Figure 2-9b, detailed column Figure 2-9: a) Block diagram showing the consecutive readout of columns b) Detailed column and chip-level readout circuit c) Corresponding timing diagram and chip-level circuits are shown that perform the column readout [2.11][2.13]. As explained in the previous section, switches S1 and S2 and capacitors C1 and C2 are used to sample the front-end outputs, while current source Ib1 biases the front-end of the readout circuit. The remainder of the column circuit, consisting of transistors M4 and M5, and switches S3-S5 are used for reading out the column. The column circuit is connected to the common output rail using switches S3 and S4 that are controlled by the *column select* N input, which is output by the column decoder (not shown). This connects bias currents I<sub>b2</sub> and I<sub>b3</sub> located inside the chip-level circuit to transistors M4 and M5 that operate as source followers. These output the sampled voltage stored on C1 and C2 onto the common output rail. Differential amplifier A1 inside the chip-level reads out the common output rail and subtracts both outputted voltages. The output of amplifier A1 is sampled on sample-and-hold circuit S/H1. As explained in the previous section, the subtraction performed by amplifier A1 cancels the offset, 1/f noise and (in case of a 4T pixel front-end) reset noise of the front-end. Unfortunately, there is another source of offset in the circuit, caused by mismatch between source followers M4 and M5. This still leads to an offset error in the output sample sampled on S/H1. To correct for this offset, another readout is performed with switch S5 closed. Since this switch shorts the source follower inputs together, only the differential offset voltage caused by the source followers' mismatch is output. This offset is stored on sample-and-hold circuit S/H2. Finally, amplifier A2 subtracts the voltage stored in S/H1 and S/H2, thereby cancelling out the offset voltage of the source followers M1 and M2. This final output can be fed into the A/D converter, which will be discussed in the next section. # 2.4.2 Chip-Level A/D Conversion Since the analog chip-level readout circuit presented in the previous section condenses the parallel column-level front-end contain double samples into a single analog output, a chip-level A/D converter used in CMOS imagers is not different from standard ADC architectures that are known in literature. Therefore, the A/D converter itself will not be discussed in detail here. The main requirements for the A/D converter are a resolution of about 10-12 bits, depending on the sensor and interface electronic performance. As noted in the previous section, if a modest imager resolution of 500 x 500 pixels operating at 30 frames/second is considered, the column circuit read-out circuit that is connected to the ADC input should read out each column within 100ns. This means that the ADC should have a sampling speed of at least 10MSPS. The combination of modest resolution and relatively high speed favors the application of a Nyquist-rate ADC. In particular, the pipeline ADC architecture is very well suited for the application, as it enables a power efficient readout while requiring relatively little chip area. As discussed in section 2.1, the back-end readout circuit described here has to work at a relatively high speed. For a modest imager resolution of about 500 x 500 pixels at a frame rate of 30 images per second, the time available to read out all column circuits is roughly 50µs. Therefore, each column needs to be read-out in 100ns, during which two sampling operations have to be performed, of which the result has to be digitized within the next 100ns. While this is easily possible with the number of pixels mentioned, the rapid development of ever-higher resolution imagers in recent years have made the chip-level readout structure more and more difficult. Therefore, in chapter 4, an alternative readout structure will be discussed, where an A/D converter is located inside every column. This eliminates the need for a high-speed analog readout by a chip-level circuit and is one of the main focus points of this thesis. # 2.5 Advanced Analog Signal Processing Techniques In the previous two sections, an overview was given of a typical analog signal processing chain. While this structure forms a basis for understanding analog signal processing in CMOS imagers, many refinements and/or alternatives to the typical solution were published over the years. In this section, a brief overview is provided of alternative and advanced analog signal processing techniques. # 2.5.1 Sharing of Readout Circuitry Among Pixels While the pixel circuit with a pinned photodiode features an excellent performance, it requires 4 transistors in each pixel. These transistors decrease the amount of pixel area available for the light sensitive part, and therefore reduce the *fill factor* of the pixel, i.e. the ratio of photosensitive area to total pixel area. In order to enable CMOS imagers with a higher resolution, the pixel size has been steadily decreasing to accommodate Figure 2-10: Shared pixel circuitry concept: a) 1.75 transistor/pixel concept published by Matsushita b) 1.5 transistor/pixel concept by Canon more pixels on the same chip area. This reduces the fill factor, and therefore reduces the sensitivity and dynamic range of the pixels. In 2004, Matshushita [2.14] and Canon [2.15] both presented a solution for this problem. In both cases, some of the transistors that are required for readout are shared among several pixels. This is illustrated in Figure 2-10. As can be seen from the figure, 4 pixels share some of the readout circuitry with one another. Each pixel has a pinned photodiode PD1..PD4 with a transfer gate M1..M4 similar to the structure shown in Figure 2-6. In contrast to a 4-transistor pixel however, the 4 pixels share one common floating diffusion (marked "FD" in the figure). This floating diffusion is reset with transistor M5 and read out with source follower M7. In the solution proposed by Matsushita [2.14], the source follower is connected to the column bus with another transistor M6 controlled by a row-select signal as discussed in section 2.3<sup>1</sup>. Therefore, as shown in Figure 2-10a, 7 transistors are required to read out 4 pixels, which means that only 1.75 transistors/pixel are necessary. In the Canon publication [2.15], the row select transistor M6 is removed (Figure 2-10b). In order to disconnect the source follower M7 from the column bus, the floating diffusion is discharged to a low voltage after each read out operation. This is done by making $V_{pixel}$ low and closing transistor M6 via the reset signal. The low voltage on the floating diffusion switches the source follower M7 off and thereby allows for another pixel to be connected to the column bus. The same method is also described in [2.16]. By removing the row select transistor, only 1.5 transistors/pixel are required. The reduced amount of circuitry per pixel allows for a higher fill factor and/or a smaller pixel size for a given processing technology. This advantage comes at the price of two potential disadvantages. Firstly, the pixel circuit sharing concept implies that not every pixel layout will be exactly the same. Instead, a block of 4 pixels will be repeated to create the pixel array. This can lead to mismatch between the pixels inside each block. In a color imager, this problem can be partially solved by matching the 4-pixel block to the color filters, thereby ensuring that each distinct pixel layout corresponds to a certain color. In this way, mismatch will mainly exists between different colors, which is not a problem as the digital color post-processing that is usually performed will balance out such mismatches. A shortcoming of this approach is that a conventional color filter pattern consists of only 3 colors (red, green and blue) while a 4-pixel block is used. This can be solved by treating the two green pixels in a pixel block as separate colors by the digital post-processing. A second potential disadvantage is that the floating diffusion will be larger compared to a normal 4T pixel, since it is common to 4 pixels. This means that the associated capacitance ( $C_{FD}$ in the figure) will be larger, which in turn reduces the conversion gain of the pixel. As a result, the voltage swing at the source follower will be lower, and therefore, the performance of the readout circuit, in particular the noise performance, is more critical. <sup>1.</sup> Note that although transistor M6 is connected to the other side of the source-follower as in the conventional 4T readout structure of Figure 2-8, its function is exactly the same # 2.5.2 kT/C Noise Reduction through Soft and Active Reset As was mentioned in sub-section 2.2.1, the reset of the photodiode parasitic capacitance leads to a large amount of kT/C noise. This noise problem was one of the motivations for the development of photogates and pinned photodiode in CMOS technology, as these allow for an easy compensation of reset noise through CDS. However, it is possible to reduce the amount of kT/C noise without the use of CDS, through the use of the *soft reset* or *active reset* techniques. The soft reset method was more or less accidentally found in ordinary CMOS imagers with 3T pixels [2.17-2.18], of which the readout noise (expressed as rms voltage) was found to be less than $\sqrt{(kT)/C}$ . An explanation for this lower than expected noise was given later, for instance in [2.19]. The phenomenon can be intuitively understood by examining the voltages operating on the reset transistor in a pixel in detail. As indicated in Figure 2-11a, the reset transistor is usually nmos, since the use of a pmos transistor would require a separate n-well inside each pixel, which would cost a lot of pixel area. In order to use the nmos reset transistor M1 as a switch, it needs to operate in triode region, which means that the gate-drain voltage $V_{gdI}$ needs to be higher than the threshold voltage of M1. Whether this requirement is met, depends on the voltages $V_{reset}$ and $V_{pixel}$ . In order to allow the imager to operate at a low supply voltage while keeping the signal swing of the pixel high, $V_{pixel}$ is often chosen too high to keep transistor M1 in triode region. As a results, Figure 2-11: a) Limited voltage swing causing soft reset operation b) Active reset principle transistor M1 will operate in its weak inversion region, where its noise spectral density is different from triode region. As is well known, the physical source of kT/C noise is noise generated by the on-resistance of the switch; since this switch has now different noise properties, a lower amount of kT/C noise results. A much more thorough analysis in [2.19] shows that for typical cases the voltage noise will be about $\sqrt{(kT)/(2C)}$ . A further noise reduction can be obtained by using the *active reset* technique [2.20-2.21]. A conceptual diagram of this technique is depicted in Figure 2-11b. In order to force a low-noise reset voltage onto the photodiode, a negative feedback loop is used that senses the voltage across the photodiode $V_d$ and adjusts the voltage at the gate of reset transistor M1 accordingly. A careful design of this feedback loop ensures that no excess noise is added to $V_d$ . In [2.21], a voltage noise of $\sqrt{5}$ to $\sqrt{6}$ times lower than kT/C is reported. The design features column-based amplifiers, which allows for an implementation of the active reset loop that only requires one extra transistor inside each pixel. While the soft and active reset can be valuable techniques to improve the performance of photodiode-based CMOS imagers, there are some obvious limitations. Most importantly, even a $\sqrt{5}$ - $\sqrt{6}$ fold reduction in kT/C voltage noise still leaves a considerable amount of reset noise. Moreover, apart from the lower reset noise, imagers based on pinned photodiodes have other advantages, in particular a lower dark current. As a result, pinned photodiodes have become the most popular light sensitive device in recent years. # 2.5.3 High Dynamic Range Readout In a typical CMOS imager, the dynamic range of each pixel output is limited to about 60-70dB. This is due to limitations to both noise and signal swing of the photosensitive element and front-end circuit, which will be discussed in detail in chapter 3. There are imager applications where a much higher dynamic range is required. In order to use CMOS image sensors in such applications, several methods were found that can increase the dynamic range. In this sub-section, the three most important categories of such techniques will be discussed. A first method to increase the dynamic range is to create a pixel with a logarithmic response to light. In [2.22], this is done by connecting a load transistor to the photodiode as illustrated in Figure 2-12a. The photo-current $i_{ph}$ generated by the photodiode is not integrated, but directly converted to a voltage via the diode-connected load transistor M1. As this current is very small, the transistor operates in weak inversion, and therefore, the voltage $V_d$ depends logarithmically on the photocurrent. A disadvantage of this approach is that the sensitivity of the sensor is relatively low, as the sensor signal is not integrated. Another approach to realize a logarithmic sensor response is depicted in Figure 2-12b [2.23]. Here, the photo current is integrated onto a capacitor formed by a charge sense diffusion via charge spill transistor M2. While the photo-current will decrease the voltage across the integration capacitor, the gate level $V_a(t)$ is increased, generating an extra current that partly compensates for the photo-current. As a result of this so-called *well capacity adjusting*, the output voltage $V_c$ depends logarithmically on the light intensity. A disadvantage of all sensors with a logarithmic response to light is that some signal processing steps that are routinely performed in imagers are not effective. In particular, correlated double sampling (CDS) cannot be used, which can lead to high FPN. This problem is addressed in [2.24], where an imager is presented that features a pixel that can have both a linear and a logarithmic response. These responses can be combined into a single image with a high dynamic range. In [2.25], this concept is further refined with a pixel requiring fewer transistors. A second dynamic range enhancement method uses a system-level approach that is usually called *multiple capture* [2.26-2.29]. As the name Figure 2-12: a) photodiode with transistor load for direct photocurrent-to-voltage conversion b) integrating current approach using well-capacity adjusting implies, instead of one, several images are captured with a different integration time. As a result, the information about high light intensity regions of the captured scene is stored in an image with a short integration time, while the darker portions of the captured scene are stored in an image with long integration time. These images are later combined to form a single, high-dynamic range image. The advantages of this approach are that no extra in-pixel circuitry is required, and, in contrast to the previously mentioned logarithmic sensors, CDS can be applied to compensate for FPN. The main problem of the multiple capture method is that much more information needs to be processed. Since there is a maximum time available to capture all images, this usually means that the analog signal processing chain needs to be faster than in an ordinary imager. This higher readout speed requirement is often realized by using an analog signal path with a parallelized ADC, such as a pixel-level ADC [2.27-2.28] or high-speed column-level ADC [2.29]. If an ordinary CMOS image sensor would be used to capture a high dynamic range image, some of its pixels would saturate due to a very high light input. Instead of preventing this saturation from happening, such as with a logarithmic sensor, or with multiple capture, the time required for the pixel to saturate can also be measured, as it is inversely proportional to light intensity. The main problem of this approach is to design an efficient readout circuit that can detect pixel saturation and convert the corresponding time information into the digital domain. In [2.30], each pixels detects saturation and subsequently signals this event to circuitry outside the pixel array. Therefore, the pixel readout is not in a fixed order and at a fixed time as in a typical image sensor, but instead, the readout is random and event based. A problem of this approach is that if a large number of pixels detects saturation in a short period of time, a proper time-to-digital conversion cannot be guaranteed. In [2.31], this is elegantly solved by converting the time of saturation to an analog voltage in each pixel. This is done by a sample-and-hold capacitor that samples a ramp voltage at the moment pixel saturation is read out. The final image is composed of the voltage on the capacitor combined with an ordinary pixel voltage readout, resulting in an impressive dynamic range of 138dB. In conclusion, several methods exists to increase the dynamic range of CMOS imagers beyond the typical 60-70dB. However, implementation of any of these techniques has a significant cost in terms of increased circuit complexity, increased chip area, increased power consumption, and/or decreased image quality. In particular, it is important to note that while the mentioned techniques increase the dynamic range, this usually does not increase the signal-to-noise-ratio compared to the typical imagers. As a result, the application of dynamic range enhancement techniques is so far limited to application areas were a high dynamic range is imperative, such as automotive or machine vision applications. #### 2.5.4 Column-Level and Pixel-Level A/D Conversion The typical readout structure introduced in sections 2.3 and 2.4 uses a single, chip-level A/D converter. As a result, a 2-step analog readout process is required to feed the analog signals into the A/D converter. While this approach was mostly used at the time this thesis work was started, there are alternative solutions. Firstly, it is possible to implement an A/D converter in each column circuit. This results in a shortening of the analog readout chain and a parallelization of the ADC, which can result in a higher overall readout speed. This increased speed comes at the cost of more chip area and a design problem of having to ensure uniformity between the parallel ADCs. A further parallelization can be realized by realizing an ADC in each pixel. In chapter 4, the issue of parallelization through column-level or pixel-level A/D conversion in a CMOS imager will be discussed in detail. It will be shown that for high-resolution mainstream applications, column-level A/D conversion is a good trade-off between increased read-out speed and lower power consumption on the one hand, and increased chip area and design complexity on the other. # 2.6 References - [2.1] A. Einstein, "Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt", *Annalen der Physik*, 4th series, Vol. 17, pp. 132-148, 1905 - [2.2] G.P. Weckler, "Operation of p-n junction photodetectors in a photon flux integration mode", *IEEE Journal of Solid-State Circuits*, vol. SC-2, pp. 65-73, 1967 - [2.3] S. K. Mendis, S. E. Kemeny, R. C. Gee, B. Pain, C. O. Staller, Q. Kim, and E. R. Fossum, "CMOS active pixel image sensors for highly integrated imaging systems," IEEE Journal of Solid-State Circuits, vol. 32, pp. 187 - 197, February 1997 - [2.4] B.C. Burkey et al., "The pinned photodiode for an interline transfer CCD imager", *Proceedings of IEDM*, pp. 28-31, 1984 - [2.5] R.M. Guidash et al., "A 0.6µm CMOS Pinned Photodiode Color Imager Technology", *Proceedings of IEDM*, pp. 927-929, 1997 - [2.6] A.J.P. Theuwissen, "The Hole Role", *Invited presentation at the 3rd Fraunhofer CMOS Image Sensor Workshop*, May 16-17, 2006, Duisburg - [2.7] P. Noble, "Self-scanned silicon image detector arrays", *IEEE Transactions on Electron Devices*, vol. ED-15, pp. 202-209, April 1968 - [2.8] R. H. Nixon, S. E. Kemeny, C. O. Staller, and E. R. Fossum, "128 x 128 CMOS photodiode-type active pixel sensor with on-chip timing, control and signal chain electronics", *Charge-Coupled Devices and Solid-State Optical Sensors V, Proc. SPIE*, vol. 2415, pp. 117-123, 1995 - [2.9] E. Oba, K. Mabuchi, Y. Lida, N. Nakamura and H. Miura, "A 1/4 inch 330k square pixel progressive scan CMOS active pixel image sensor", *IEEE International Solid-State Circuits Conference*, vol. XL, pp. 180-181, February 1997 - [2.10] S. Mendis, S. Kemeny and E.R. Fossum, "CMOS active pixel image sensor", *IEEE Transactions on Electron Devices*, vol. 41, pp. 452-453, March 1994 - [2.11] R.H. Nixon, S.E. Kemeny, B. Pain, C.O. Staller and E.R. Fossum, "256 x 256 CMOS active pixel sensor camera-on-a-chip", *IEEE Journal of Solid-State Circuits*, vol 31, pp. 2046-2050, December 1996 - [2.12] M.J. Loinaz, K.J. Singh, A.J. Blanksby, D.A. Inglis, K. Azadet, and B.D. Ackland, "A 200-mW, 3.3-V, CMOS color camera IC producing 352 x 288 24-b video at 30 frames/s", *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 2092-2103, December 1998 - [2.13] A.J. Blanksby, M.J. Loinaz, "Performance analysis of a color CMOS photogate image sensor", *IEEE Transactions on Electron Devices*, vol. 47, pp. 55-64, January 2000 - [2.14] M. Mori, M. Katsuno, S. Kasuga, T. Murata, and T. Yamaguchi, "1/4-inch 2-Mpixel MOS image sensor with 1.75 transistors/pixel", *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 2426 2430, December 2004. - [2.15] H. Takahashi et al., "A 3.9-µm pixel pitch VGA format 10-b digital output CMOS image sensor with 1.5 transistor/pixel", *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 2417 2425, December 2004. - [2.16] K. Mabuchi et al., "CMOS image sensors comprised of floating diffusion driving pixels with buried photodiode", *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 2408 2416, December 2004. - [2.17] O. Yadid-Pecht, B. Pain, C. Staller, C. Clark, and E. Fossum, "CMOS active pixel sensor star tracker with regional electronic shutter", *IEEE Journal of Solid-State Circuits*, vol. 32, pp. 285 288, February 1997. - [2.18] K. Singh, "Noise analysis of a fully integrated CMOS image sensor", *in Proc. SPIE*, vol. 3650, pp. 44-51, January 1999 - [2.19] H. Tian, B. Fowler, and A. El Gamal, "Analysis of temporal noise in CMOS photodiode active pixel sensor", *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 92 101, January 2001. - [2.20] B. Fowler, M. Godfrey, J. Balicki, and J. Canfield, "Low noise readout using active reset for CMOS APS", *in Proc. SPIE*, vol. 3965, pp. 126-135, 2000 - [2.21] B. Pain, T.J. Cunningham, B. Hancock, G. Yang, S. Seshadri, and M. Ortiz, "Reset noise suppression in two-dimensional CMOS photodiode pixels through column-based feedback-reset", *IEEE International Electron Devices Meeting*, pp. 809 812, December 2002 - [2.22] S. Kavadias, B. Dierickx, D. Scheffer, A. Alaerts, D. Uwaerts and J. Boagaerts, "A logarithmic response CMOS image sensor with on-chip calibration", *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 1146 1152, August 2000. - [2.23] S. Decker, R. McGrath, K. Brehmer, and C. Sodini, "A 256x256 CMOS imaging array with wide dynamic range pixels and column-parallel digital output", *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 2081 - 2091, December 1998. - [2.24] G. G. Storm, J. E. D. Hurwitz, D. Renshaw, K. M. Findlater, R. K. Henderson, and M. D. Purcell, "Combined linear-logarithmic CMOS image sensor," *IEEE International Solid-State Circuits Conference*, vol. XVII, pp. 116 117, February 2004. - [2.25] K. Hara, H. Kubo, M. Kimura, F. Murao, and S. Komori, "A linear-logarithmic CMOS sensor with offset calibration using an injected charge signal", *IEEE International Solid-State Circuits Conference*, vol. XVIII, pp. 354 - 355, February 2005. - [2.26] O. Yadid-Pecht and E. Fossum, "Wide intrascene dynamic range CMOS APS using dual sampling", *IEEE Transactions on Electron Devices*, vol. 44, pp. 1721-1723, October 1997 - [2.27] D. Yang, A. El Gamal, B. Fowler, and H. Tian, "A 640x512 CMOS image sensor with ultrawide dynamic range floating-point pixel-level ADC", *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1821- 1834, December 1999 - [2.28] W. Bidermann, A. E. Gamal, S. Ewedemi, J. Reyneri, H. Tian, D. Wile, and D. Yang, "A 0.18μm high dynamic range NTSC/PAL imaging system-on-chip with embedded DRAM frame buffer," *IEEE International Solid-State Circuits Conference*, vol. XLVI, pp. 212 213, February 2003. - [2.29] M. Mase, S. Kawahito, M. Sasaki, Y. Wakamori, and M. Furuta, "A wide dynamic range CMOS image sensor with multiple exposure-time signal outputs and 12-bit column-parallel cyclic A/D converters," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 2787 2795, December 2005. - [2.30] E. Culurciello, R. Etienne-Cummings, K. Boahen, "Arbitrated address event representation digital image sensor", *IEEE International Solid-State Circuits Conference*, vol. XLIV, pp. 92 - 93, February 2001 - [2.31] D. Stoppa, A. Simoni, L. Gonzo, M. Gottardi, and G. Dalla Betto, "A 138dB dynamic range CMOS image sensor with new pixel architecture", *International Solid-State Circuits Conference*, vol. XLV, pp. 40-41, February 2002 # Front-End Readout Circuitry In this chapter, the front-end readout circuitry of a CMOS imager will be studied in detail. The front-end is defined as the in-pixel readout circuit, together with the in-column biasing and sample and hold circuitry. It is the most critical analog circuit of the CMOS imager, as it limits the overall performance of the sensor. Moreover, the amount of chip area available for the front-end is severely limited, since it is located both inside the pixel and column of the imager. This forms a major design constraint, as will be shown in this chapter. In section 3.1, a number of performance aspects of the front-end circuitry will be discussed. By excluding other performance parameters, it will be shown that noise is its most important issue. In section 3.2, the noise in the front-end will be discussed in detail, and the different noise sources that are present in a CMOS imager front-end will be compared. It will be shown that because of the limited chip area available, 1/f noise is the dominant noise source. Section 3.3 describes a new and relatively unknown technique to reduce 1/ f noise, called Large-Signal Excitation or Switched-Biasing. To evaluate the effectiveness of this technique in an imager, a custom measurement IC was made. Section 3.4 describes this chip and presents the measurement results. Finally, in section 3.5, conclusions will be drawn on the performance of the front-end that have an important impact on the remainder of the thesis. #### 3.1 Front-End Readout Circuit Performance In this section, the signal swing, linearity, offset and power consumption of the front-end circuit will be discussed. By discussing these important performance aspects, it will become clear that none of them forms an essential limit on the performance of the front-end circuit. This leaves out the last and most important performance parameter of noise, which will be discussed in more detail in section 3.2. ### 3.1.1 Signal Swing In order to determine the dynamic range of any analog circuit, two parameters are of importance: its noise floor and its maximum signal swing. While the noise floor of the front-end requires a more detailed study (section 3.2), the maximum signal swing of the image sensor will be discussed here. In Figure 3-1, the 3T pixel front-end circuit is shown detailing the biasing voltages that limit the signal swing. In such a front-end, the maximum signal swing can be defined as the difference between the reset voltage and the maximum signal voltage. Note that a high light intensity on the sensor corresponds to a low output voltage, as the photon-generated current will decrease the voltage $V_{\rm d}$ over the diode. In a conventional imager design, a so-called 'hard' reset is performed, which means that the reset switch M1 has a sufficiently low on-resistance to ensure that the voltage on the photodiode equals the pixel supply voltage $V_{pixel}$ . However, this means that NMOS transistor M1 must be in the triode region during the reset operation, which is not a trivial requirement since it has to switch a high voltage. While it might seem better to use a PMOS transistor as a reset switch, this is usually not done for a simple reason: a PMOS transistor would require an n-well inside each pixel, which would require too much chip area. As a result, $V_{gI}$ has to be increased above the pixel supply voltage $V_{pixel}$ to ensure that M1 properly operates as a switch. Usually, $V_{gI}$ is increased as much as possible, taking into account the gate-oxide breakdown voltage of the process. After this, $V_{pixel}$ is chosen such that $V_{gsI}$ is high enough to ensure Figure 3-1: Circuit diagram of a 3T pixel front-end detailing the bias voltages limiting the signal swing a proper switch function of M1. Therefore, $V_{pixel}$ can be expressed as follows: $$V_{pixel} = V_{g1} - (V_{th1} + V_{ovd}) (3-1)$$ where $V_{thI}$ is the threshold voltage of M1 (including body-effect), and $V_{ovd}$ is the overdrive voltage necessary to ensure that M1 is properly switched on. After the photodiode is reset, the voltage $V_d$ is read out via source follower M2, which means that the reset voltage at the output of the front-end can be expressed as: $$V_{out,reset} = V_{pixel} - V_{gs2} \tag{3-2}$$ In a typical 0.18µm process that has thick-oxide transistors capable of handling 3.3V, $V_{gl}$ would be about 3.5V, $(V_{th1}+V_{ovd})$ would be about 1.5V, resulting in a $V_{pixel}$ of about 2.0V, and $V_{gs2}$ would be about 0.8V. As a result, the reset voltage at the output would be about 1.2V. The lowest possible output voltage is determined by the signal swing of source follower M2. This source follower is biased by transistor M4, which functions as a current source. Therefore, the output voltage should remain at least a saturation voltage $V_{sat4}$ above the ground in order for the source follower to operate properly. The maximum signal swing can therefore be written to be: $$|V_{out,reset} - V_{out,signal}|_{max} = V_{pixel} - V_{gs2} - V_{sat4}$$ (3-3) Therefore, if a typical value for $V_{sat4}$ of 0.2V is assumed, the maximum signal swing at the output of the source follower is roughly 1V. In order to calculate the maximum voltage swing at the photodiode node, the voltage gain of the source follower needs to be taken into account. Even if the effects of limited output resistance of both the source follower M2 and the current source M4 are neglected, the gain is less than unity, because of the body effect. The voltage gain $A_{\nu}$ can be expressed as [3.1]: $$A_V = \frac{g_{m2}}{g_{m2} + g_{mb2}} \tag{3-4}$$ where $g_{m2}$ is the transconductance of M2 and $g_{mb2}$ is the back gate transconductance. This typically results in a gain of about 0.8, and therefore, for an output voltage swing of 1V, the voltage swing at the photodiode node will typically be about 1.25V. In deep-submicron processes, the drop of two threshold voltages in the front-end leads to a problem. Since the maximum supply voltage is below 2V, the signal swing will be zero if normal transistors are be used. This is routinely solved by using high-voltage transistors, which are usually available in such processes to enable 3.3V digital I/O. Moreover, some CMOS processes that are optimized for imaging include a processing step to lower the threshold voltage of the in-pixel NMOS transistors. Since 4T pixel front-ends have essentially the same readout circuit as the 3T pixel circuit shown in Figure 3-1, the maximum signal swing is essentially the same. In some cases, the limited charge transfer efficiency of pinned photodiodes can lead to a further limitation of the signal swing. In conclusion, the signal swing in the imager front-end is limited by the use of NMOS transistors. This limits the swing to a typical value of about 1V. While an increase in this swing could lead to a higher dynamic range, supply voltages in the order of 2-3V would limit the maximum dynamic range increase to a few dB. Therefore, in current CMOS processes, the voltage swing should not be considered to be the most significant performance limitation of the front-end readout circuit. ### 3.1.2 Linearity As the front-end readout is performed with a simple source follower, its linearity performance is relatively poor compared to most other sensor interface circuits. As was shown in the previous sub-section, the body effect has an effect on the gain of the source follower. More precisely expressed, the body effect causes the threshold voltage of an MOS transistor to vary, which can be expressed as [3.1]: $$V_{TH} = V_{TH0} + \gamma (\sqrt{|2\phi_F + V_{SB}|} - \sqrt{|2\phi_F|})$$ (3-5) where $V_{TH0}$ is the threshold voltage for $V_{SB} = 0$ , $\gamma$ is the body effect coefficient, $\phi_F = (kT/q) \ln(N_{sub}/n_i)$ , $N_{sub}$ is the doping concentration of the substrate, $n_i$ is the intrinsic concentration of electrons in silicon, and $V_{SB}$ is the source-bulk voltage. In an NMOS source follower, the source-bulk voltage equals the output voltage and the gate-source voltage equals the voltage difference between input and output. Therefore, it is obvious from equation 3-5 that the square root dependency between the source-bulk voltage and the threshold voltage will cause a non-linearity in the gain of the source follower. By numerically evaluating the equation for some typical semiconductor parameters, the non-linearity was found to be 0.35% for an input voltage swing of 1V. In order to keep the required chip area to the minimum, the in-pixel transistor that is used as a source-follower will have a minimum length and a near-minimum width. As a result, several short-channel effects will have an effect on the linearity of the source follower. A simulation of a transistor with a W/L of $1\mu m/0.18\mu m$ in a typical $0.18\mu m$ process shows a non-linearity of about 0.5% for an input swing of 1V. While such non-linearity would be considered to be too bad in some applications, in the context of a CMOS imager it is not an issue. This is because the sensor itself exhibits significant non-linearity. As explained in sub-section 2.2.1, the photodiode will be operated in integrating mode: the photo-generated current is stored onto the parasitic capacitance of the photodiode itself. Therefore, the photocurrent $i_{ph}$ can be expressed as: $$i_{ph} = -C_D \frac{dV_R}{dt} \tag{3-6}$$ where $V_R$ is the reverse bias voltage across the photodiode and $C_D$ is its capacitance. In order to have a linear relation between photocurrent and voltage drop across the photosensitive element, the capacitance $C_D$ should be independent of $V_R$ . However, this is not the case, as can be understood intuitively by considering the fact that in a reverse biased p-n junction, the distance between the 'plates' of the capacitor is determined by the width of the depletion layer. This width is obviously dependent on the reverse bias voltage, and therefore the capacitance changes with voltage. In order to quantify the non-linearity caused by the voltage-dependent capacitance, the photodiode can be approximated as a one-sided p-n junction. Therefore, the capacitance $C_D$ depends on the reverse bias voltage as follows [3.2]: $$C_D(V_R) = \frac{A_D}{2} \sqrt{\frac{2q\varepsilon_{si}N_A}{V_R + V_{bi}}}$$ (3-7) where $A_D$ is the area of the device, q is the charge of an electron, $\varepsilon_{si}$ is the dielectric constant for silicon and $V_{bi}$ is the built-in potential of the p-n junction. By combining eq. 3-6 and 3-7, and solving the resulting differential equation, an expression for the voltage over the photodiode as a function of time is obtained: $$V_R(t) = \left[ \sqrt{V_{res} + V_{bi}} - \frac{i_{ph}}{A_D \sqrt{2q\varepsilon_{si}N_A}} \cdot t \right]^2 - V_{bi}$$ (3-8) where the reset voltage $V_{res}$ equals $V_R(0)$ . It is obvious from equation 3-8 that the voltage across the photodiode does not depend linearly on the photocurrent. For photogates or pinned photodiodes, the capacitance of the floating diffusion is determining the linearity; since the floating diffusion can also be considered as a one-sided p-n junction, the same equations apply. Figure 3-2 shows a plot of $V_R(t)$ , which was acquired by applying some typical process parameters in Eq. (3-8). In this graph, the maximum Figure 3-2: Reverse bias voltage accross the photodiode $V_R$ vs. time according to Eq. (3-8) non-linearity is 9% for a voltage swing of about 1V (based on a line fitted to the curve). It should be noted that the approximation of the photodiode as a one-sided junction might not be accurate for some doping concentrations, but it is nonetheless clear that the non-linearity is at least 1%. In [3.3], a measured non-linearity of 1% is reported. Therefore, as long as the linearity of the readout circuit remains below 1%, it can be considered of little importance to the overall performance. #### 3.1.3 Fixed Pattern Noise In imaging, the term *Fixed Pattern Noise* refers to static non-uniformities between different pixels or columns of the imaging array. Therefore, "noise" in this context does not relate to random fluctuations in time domain, but rather to random fluctuations in the *spatial* domain, resulting in a 'fixed-pattern' that is visible regardless of the image captured. These spatial variations can be divided into two components: an offset and a gain variation. The main problem with fixed pattern noise is that it creates artifacts in the image that are highly visible to the human eye. This is illustrated in Figure 3-3. Here, pixel as well as column-level FPN was simulated by adding random offsets to portions of Figure 3-3: Simulated effects of pixel and column FPN. a test image with a resolution of 510 x 409 pixels. For both pixel and column FPN, a gaussian-distributed offset with a $\sigma$ of 5% of full-scale was added. As can be seen in the image, pixel FPN results in a granular, 'snow' effect, while column FPN results in clearly visible stripes. Moreover, while the amount of pixel and column FPN is equal the column FPN is much more visible than the pixel FPN. This is an important observation that will be the basis of the dynamic column switching technique introduced in section 4.4. While it is hard to quantify the uniformity requirements based on perceptual observations, a generally accepted specification is about 0.5% of full scale for pixel FPN and 0.1% of full scale for column FPN [3.4]. The main source of pixel-level non-uniformities in the analog front-end is offset of the source follower (transistor M2 in Figure 3-3). However, as explained in section 2.3, a double sampling scheme is applied to correct for this offset. The residual offset is negligible: for instance, in [3.5] a residual pixel offset of 0.09% has been reported. The residual offset of the double sampling is caused by charge injection mismatch of the sampling switches. As these switches are implemented inside the column, their charge injection mismatch actually results in column offsets. However, by minimizing the switch size, it is not a problem to reduce the required mismatch to less than 0.1% of full scale (this is equivalent to about 1mV). Apart from offset, gain mismatch between the pixels also leads to FPN. However, FPN effects caused by gain variations are not as visible as offset variations. In [3.5], a pixel gain mismatch of 0.36% has been reported, which is well below visible levels. Finally, it should be noted that apart from the front-end readout circuit, there are other factors that can also cause FPN, in particular dark current. However, since such effects do not relate to the readout circuit, their discussion is outside of the scope of this work. In conclusion, the use of double sampling in the pixel front-end effectively compensates for non-uniformities that can cause FPN, and therefore, FPN is not a critical performance parameter of the front-end circuit. # 3.1.4 Power Consumption While the front-end readout circuit has a very large transistor count compared to other parts of the analog signal processing chain, its power consumption is nonetheless insignificant compared to the back-end readout circuit. There are two reasons for this. Firstly, the read-out operation is performed on a row-by-row basis, which means that at any given time the vast majority of pixels do not consume any power. Secondly, even the active row of front-end readout circuits is switched on for less than 10% of the time, as most time is used to read out the columns one-by-one (see section 2.1). The bias current source in the column (transistor M4 in Figure 3-1) defines the power consumption of the front-end circuit. This power consumption is set such that the voltage across the sampling capacitors settles within the required time. Since the output resistance of the source follower is roughly equal to the inverse of its transconductance, the time constant $\tau$ with which the output settles can be expressed as: $$\tau = \frac{C}{g_m} \tag{3-9}$$ where C is the total capacitance at the output node and $g_m$ is the transconductance of the source follower. In a typical case, the sampling time available would be about $1\mu s$ , which means that $\tau$ should be about $0.2\mu s$ for proper settling, while C would equal to about 3pF (1pF sampling capacitance + 2pF parasitic capacitance of the column bus). This means that the required $g_m$ would be $15\mu S$ . As this value is quite low, it can be assumed that the source follower transistor is in weak inversion, and therefore $g_m$ is roughly equal to 20 times the bias current. As a result, a minimum bias current of $0.75\mu A$ is needed per column. For an imager with a resolution of 500 columns where the front-end is operational for 10% of the time, the total average current consumption equals $37.5\mu A$ , which is negligible compared to that of the back-end readout circuit. Therefore, power consumption is not a critical performance parameter of the front-end readout circuit. # 3.2 Front-End Temporal Noise Sources In the previous section, signal swing, linearity, fixed-pattern noise and power consumption of the front-end circuit were discussed, and it was shown that none of these parameters constitute a practical limit on the front-end circuit's performance. This leaves noise as the defining performance parameter of the front-end circuit. In this section, the different physical noise sources present in the front-end circuit will be described. In addition, noise generated inside the photosensitive element itself will also be discussed, defines a practical upper limit to the noise performance of the front-end. At the end of the section, the noise sources will be compared and a dominant noise source will be identified. #### 3.2.1 Photon Shot Noise Photon shot noise is the most fundamental of all the noise sources in imagers, as it relates to fundamental physical laws, rather than to IC technology or circuit design. It is caused by the fact that energy and matter have a fundamentally discrete nature, as described by the theory of quantum mechanics. In an imager, the quantized nature of energy manifests itself in the form of discrete photons that interact with the silicon lattice to create discrete electrons. Even when the light intensity incident on the imager is perfectly constant, the number of incident photons, and thus the number of generated electrons inside a photosensitive element is a random variable. Therefore, if an array of photosensitive elements is exposed to a perfectly uniform light source, the amount of charge integrated in each photosensitive element will have a random Poisson distribution [3.6]. The magnitude of the noise that is thus generated is equal to the square root of the mean number of electrons stored in the photodiode. Therefore, the rms noise voltage at the sensor node equals: $$\overline{V_{psn}} = \frac{q}{C_{pe}} \cdot \sqrt{e_{pe}}$$ (3-10) Where $V_{psn}$ is the photon shot noise expressed in volts rms, q is the charge of an electron, $C_{pe}$ is the capacitance associated with the photosensitive element (i.e. either the photodiode capacitance or floating diffusion capacitance) and $e_{pe}$ is the mean amount of photo-generated electrons inside the photosensitive element. The ratio of q and $C_{pe}$ is usually called the *conversion gain*, as it defines the gain of the conversion from charge into voltage at the input of the readout circuit. While equation 3-10 seems to suggest that an increase in the capacitance $C_{pe}$ would improve the sensor performance, this is not true. Note that the capacity of the photosensitive element $C_{pe}$ does not only determine the noise, but also the signal voltage at the sensor node, as expressed in equation 2-2: $$v_{pe} = \frac{q}{C_{pe}} \cdot e_{pe} \tag{3-11}$$ Therefore, increasing $C_{pe}$ will also decrease the sensitivity (expressed in voltage) of the sensor, thus exactly cancelling out the reduction in voltage noise. Therefore, in order to understand how to minimize the effect of photon shot noise, the signal-to-noise ratio should be calculated. Since the sensor output signal is directly proportional to the number of captured electrons $e_{pe}$ , the signal-to-noise ratio can be expressed as: $$\frac{signal}{noise} = \frac{e_{pe}}{\sqrt{e_{pe}}} = \sqrt{e_{pe}}$$ (3-12) Since all photosensitive elements have a maximum amount of charge at which the photosensitive element will saturate, the maximum signal-to-noise ratio attainable equals the square root of the saturation charge. Therefore, the maximum signal-to-noise ratio of the sensor can be increased by increasing the saturation charge of the sensor. Unfortunately, such increase is in contradiction with the desire to make smaller pixels, as a smaller photosensitive element can usually store less charge. An important and unusual property of photon shot noise is its dependence on signal level. Most noise sources have a constant magnitude independent of signal, thereby constituting a minimum level the signal should have to be detectable, often called a 'noise floor'. However, photon shot noise increases with the square root of the signal level. This is an important property that can be exploited in A/D converters, as will be discussed in section 4.3. #### 3.2.2 Reset Noise As already discussed in sub-section 2.2.1, the operation of photosensitive elements in integrating mode requires a periodic reset operation, which leads to reset noise. As explained in chapter 2, the reset noise can be cancelled if the front-end readout circuit is able to take two correlated samples of the reset noise, where one sample also contains the signal. Since this requires a frame memory in a photodiode-based imager, photogate and pinned photodiodes were developed, where a floating diffusion capacitance is reset before taking both readout samples. In this case, the correlated double sampling is effective in reducing the reset noise to negligible levels along with pixel-level offsets, as reported in [3.5]. In a photodiode-based front-end however, the double sampling operation required to correct for pixel offsets actually increases the reset noise: since two uncorrelated noise samples are subtracted, the noise voltage increases with the square root of two. Unless soft or active reset methods are used, as explained in sub-section 2.5.2, the reset noise equals $\sqrt{(2kT)/C}$ (in rms voltage). For a typical photodiode capacitance of 10fF, this results in a reset noise of 910µV rms. #### 3.2.3 Thermal Noise Thermal noise, also known as 'white' noise or Johnson noise, is caused by the random thermal motion of charge carriers in a conductor. In the front-end of an imager, this noise source is generated inside the channel of the source follower transistor. As is well known in analog CMOS design literature [3.1], the thermal noise in the drain current of a MOS transistor in saturation can be written as: $$\overline{I_n^2} = 4kT\gamma g_m \tag{3-13}$$ where $g_m$ is the transconductance of the transistor and $\gamma$ is a scaling factor depending on transistor length and technology. For transistors with a long channel, $\gamma$ is equal to 2/3, while for short-channel length transistors in modern CMOS processes it can be as high as 2.5. As can be seen from the formula, for noise calculations a MOS transistor can be regarded as a resistor with resistance equal to $I/(g_m)$ that has a noise density that is a factor of $\gamma$ different from a normal resistor. Furthermore, according to Eq. (3-9), the bandwidth of the front-end is determined by a first-order filter formed by $g_m$ and the load capacitance C. Therefore, the total thermal noise contribution of the front-end only depends on the load capacitance, and similar to sampling noise, the voltage noise at the output can be approximately expressed as: $$\overline{V}_n \approx \sqrt{\frac{\gamma kT}{C}}$$ (3-14) An important observation here is that, unlike the other noise sources described in this section, the thermal noise of the front-end can be decreased by modifying the circuit design, i.e. by increasing the sampling capacitance C. #### 3.2.4 1/f Noise Apart from thermal noise, 1/f noise is the other main source of circuit noise generated inside a MOS transistor. While thermal noise relates to the well-understood effect of thermal motion of charge carriers in a conductor, 1/f noise is still subject to active research. There are probably several distinct physical effects causing 1/f noise. The most important and generally accepted cause is the presence of lattice defects at the interface of the silicon channel of the MOS transistor and the gate oxide [3.7]. These defects, or 'traps' can capture a charge carrier from the channel, and release this charge after a while, leading to random channel current variations. The number of traps is highly dependent on the 'cleanness' of the oxide-silicon interface, and is thus highly dependent on technology. The amount of 1/f noise generated inside an MOS transistor can be Figure 3-4: Noise power spectrum showing the effect of 1/ f noise increase due to process scaling described as a voltage source in series with the channel with a spectral density of approximately [3.1]: $$\overline{V_n^2} = \frac{K}{C_{ox}WL} \cdot \frac{1}{f} \tag{3-15}$$ Here, K is a technology dependent parameter, $C_{ox}$ is the gate capacitance per unit of gate area, W and L the width and length of the transistor, and f the frequency. Note that the only design variable that is available to the circuit designer is the gate area WL; unlike thermal noise, 1/f noise (expressed as a voltage) is in first order not dependent on bias current. The noise voltage expressed by Eq. (3-15) is only correct for transistors in saturation region. This is because the physical mechanism causing 1/f noise is fluctuations in the cannel current. Therefore, if the transistor is operating in its triode region, the 1/f noise voltage is much lower, as effective channel resistance is much lower. Because of this, 1/f noise generated by the in-pixel reset and row-select switches can be neglected. However, the in-pixel source follower transistor does contribute a significant amount of 1/f noise. Since it is located inside the pixel, increasing the gate area WL to reduce 1/f noise is not possible as it would either increase the pixel size or decrease the light sensitive area. In section 2.2, the front-end readout operation was described, and it was explained how the use of double sampling in this readout can cancel the offset and 1/f noise of the front-end. However, this double-sampling operation is only effective when the 1/f noise is correlated between the samples. In the frequency domain (Figure 3-4), this corresponds to the double sampling frequency being at least twice as high as the 1/f corner frequency $f_{Cl}$ , i.e. the frequency where 1/f noise starts to dominate [3.8]. Since 1/f noise is highly technology-dependent, the most important question is how 1/f noise changes as process feature sizes decrease. Intuitively, the inversely proportional relation between 1/f noise and gate area WL in Eq. (3-15) already leads to an expected increase of 1/f noise. This is of course a very simplistic assumption; even in the approximate model of Eq. (3-15), both WL, $C_{ox}$ and K can be expected to change with technology scaling. However, more accurate predictions [3.18] also show that 1/f noise will increase as process feature sizes decrease. Moreover, apart from a 1/f noise increase due to downscaling, the introduction of high-k dielectric materials as gate insulation in deep-submicron processes is expected to further increase 1/f noise [3.9][3.18]. Because of the 1/f noise increase, the corresponding 1/f corner frequency will increase from $f_{cl}$ to a higher frequency $f_{c2}$ . On the other hand, the frequency at which the correlated-double sampling is performed $(f_{ds})$ unfortunately does not increase, since a certain minimum amount of time is required to transfer the signal charge to the floating diffusion. This charge transfer time unfortunately does not scale down with smaller device geometries, as it is related to the magnitude of electric fields inside the photogate or pinned photodiode. As a result, the double sampling frequency will become lower than the 1/f corner frequency, and therefore only part of the 1/f noise will be cancelled by the double sampling operation. The exact amount of residual 1/f noise strongly depends on process technology and is somewhat difficult to calculate. In [3.10], a residual 1/f noise of $340\mu V$ was reported. # 3.2.5 Comparison of Noise Sources To evaluate the relative importance of each noise source, Table 3-1 provides an overview with an estimate of the magnitude of all noise sources. | | Photodiode<br>front-end (3T) | Photogate/pinned<br>photodiode<br>front-end (4T) | |-------------------|------------------------------|--------------------------------------------------| | photon shot noise | 0-3.5mV | 0-7mV | | reset noise | 800μV | - | | thermal noise | 150μV | 150μV | | 1/f noise | 350μV | 350μV | Table 3-1. Estimated magnitude of 3T and 4T pixel noise sources For the noise figures in this table, the following estimates were used: for photon shot noise, the saturation charge was estimated to be 80,000 electrons for a photodiode and 20,000 for a photogate or pinned photodiode. For both cases, the maximum pixel voltage swing was estimated to be 1V, the column sampling capacitors were estimated to be 1pF and the $\gamma$ (Eq. (3-13)) scaling factor was estimated to be 2.5. As can be seen in the table, the lower saturation charge of 4T pixels leads to a higher maximum photon shot noise voltage, as can be understood from Eq. (3-12). However, this higher amount of photon shot noise only occurs at maximum light intensity. The higher noise floor of the photodiode front-end on the other hand results in a lower dynamic range, which, for imagers, is usually defined as the ratio between the maximum signal level and noise level in the absence of a signal. This means that the photon shot noise is not taken into account when calculating the dynamic range. With the noise figures in Table 3-1, the dynamic range of the photodiode front-end would be 61dB, while the dynamic range of the pinned photodiode would be 68dB. As can be seen from the table, reset noise is the dominant noise source in photodiode front-ends, while 1/f noise generated by the source follower is the dominant noise source in 4T pixel front-ends [3.10]. The latter is an important observation, as it shows that when photogates or pinned photodiodes are used, the front-end circuit noise actually dominates over noise generated by the sensor itself. Therefore, in the next section of this chapter, 1/f noise will be studied in more detail, and a circuit technique will be introduced that can reduce 1/f noise in imager front-ends. A measurement circuit will be presented that can evaluate the effectiveness of the proposed 1/f noise reduction technique, and corresponding measurement results will be shown. # 3.3 1/f Noise Reduction Using Large-Signal Excitation (LSE) In this section, a new circuit technique to reduce 1/f noise in MOS transistors will be introduced, called Large-Signal Excitation (LSE). Before introducing this technique, 1/f noise will be described in more detail in sub-section 3.3.1. In particular, it will be shown what the main 1/f noise model, the McWhorter model, predicts for small transistors in deep-submicron processes, such as the transistors used in CMOS imager readout front-ends. Next, in sub-section 3.3.2, the LSE technique will be introduced. Finally, in sub-section 3.3.3, the application of LSE in CMOS imagers will be discussed. ## 3.3.1 1/f Noise in Deep-Submicron MOS Transistors In spite of over 50 years of research into 1/f noise phenomena in electronic devices, there is still discussion about the exact physical mechanisms that give rise to 1/f noise in an MOS transistor. However, as already explained in sub-section 3.2.4, it is generally accepted that lattice defects in the interface between silicon substrate and gate oxide play the most important role [3.7]. These defects or so-called 'traps' will capture charge carriers from the channel, and release them into the channel again after a while. As a result, the channel current fluctuates in a random fashion. It was McWhorter [3.11] who first showed that the trapping/detrapping process can lead to a 1/f type spectrum. To this end, he described the behavior of each single trap as a so-called random telegraph signal (RTS), i.e. a signal that is randomly fluctuating between two states. If the power spectral density (PSD of such a signal is plotted, it yields a so-called Lorentzian spectrum, as depicted in Figure 3-5a. The corner frequency that can be seen in this PSD depends on the statistical properties of the RTS noise, which in turn is related to the physical properties of the trap. An MOS transistor will usually contain a large number of traps; Figure 3-5: a) PSD of a random telegraph signal b) A combination of a large number of RTS spectra yields a 1/f spectrum, as postulated by McWhorter assuming that these traps do not interact with one another, the PSDs of the individual traps can be added to yield the PSD of the noise generated by the transistor. McWhorter showed that if all traps inside the transistor generate an RTS with the same amplitude, and the corner frequency of the corresponding PSDs is exponentially distributed, then a 1/f noise spectrum will result. This is intuitively illustrated in Figure 3-5b. The resulting noise model is called the McWhorter or $\Delta N$ model, where $\Delta N$ symbolizes the fluctuation of the number of carriers in the channel. Apart from the $\Delta N$ model, another school of thought considers 1/f noise to be caused by fluctuations in the *mobility* of charge carriers in silicon, which is called the $\Delta\mu$ model. In 1969, Hooge [3.12] showed that homogenous semiconductor samples suffer from bulk 1/f noise, which was later related to mobility fluctuations. Whereas p-channel MOSFETs are reported to show behavior in accordance with the $\Delta\mu$ model, n-channel MOSFETs more often behave according to the $\Delta N$ model. In 1990, Hung [3.13-3.14] proposed a unified model that includes both mentioned models, as well as the fluctuations in mobility caused by (and correlated to) fluctuations in the number of charge carriers. This model, if provided with correct parameters, agrees well with measurement results on large MOS transistors, and has therefore become the standard for modern circuit simulators. The McWhorter model makes an interesting prediction for small area transistors in deep submicron processes. While the model assumes the presence of a large number of traps inside each transistor, small transistors in deep-submicron processes might contain only a few or even only one trap per gate. Assuming that the model is correct, this will have two consequences. First, the effect of a single trap inside a gate will become visible. As a result, small transistors will no longer exhibit a 1/f noise spectrum, as such a spectrum is the result of a *large* number of traps. Second, if only a few traps inside a transistor determines its behavior, a large spread in noise magnitude can be expected: some transistors will be 'lucky' and have only one, or even no trap in their substrate/gate oxide interface, others will be 'unlucky' to have a lot of traps [3.15-3.16]. Both of these predictions have been confirmed by measurements on small transistors in deep-submicron processes [3.17-3.18]. An example of such a measurement is shown in Figure 3-6a, where the current fluctuation of a transistor with a gate area of $0.18\mu\text{m}^2$ is shown [3.18]. As can be seen from the figure, the current fluctuation clearly has an RTS fluctuation. The corresponding power spectral density (PSD) is plotted in Figure 3-6b. As expected, this is a Lorentzian spectrum. Therefore, small transistors in deep-submicron processes do not have a real 1/f noise spectrum. Instead, it is more correct to refer to this noise as low-frequency (LF) noise [3.17-3.18], which will be done in the rest of this chapter. Figure 3-6: a) Measurement of RTS noise in a transistor with a gate area of $0.18\mu m^2$ b) Corresponding PSD In conclusion, both the McWhorter model and experimental results lead to the prediction that the noise generated by small transistors inside CMOS imager front-end readout circuits will not exactly have a 1/f spectrum, but rather a Lorentzian-like spectrum. Furthermore, apart from the expected mean increase of the LF noise predicted by Eq. (3-15), a considerable spread in LF noise magnitude between transistors can be expected, as only a few traps determine the noise generation inside each transistor. # 3.3.2 LF Noise Reduction using Large-Signal Excitation (LSE) Large-Signal Excitation (LSE), also called 'switched-biasing' is a relatively unknown technique to reduce LF noise in MOS transistors. The effect was first published by Bloom and Nemirovsky in 1991 [3.19]. However, no analog circuit designs using this technique were published until the effect was independently observed in a ring oscillator at the University of Twente in the Netherlands in 1998 [3.20]. Further research lead to measurement results on a variety of processes and greatly added to the understanding of the phenomenon in deep-submicron transistors [3.17][3.18][3.20-3.22]. By applying the LSE technique, the LF noise of an MOS transistor can be reduced by periodically switching it 'on' and an 'off', as shown in Figure 3-7. This can be done by manipulating the bias voltage at the gate of the transistor. Measurement results show that when the transistor is switched 'off', the source gate voltage should be well below threshold in order for the LF noise reduction to occur. If a duty cycle of 50% between the 'on' and 'off' state is assumed, an LF noise reduction of 6dB compared to steady-state operation would be expected, as the device only produces noise 50% of the time. However, in [3.20] measurements on Figure 3-7: Principle of the LSE technique large NMOS transistors from commercially available HEF4007 logic ICs showed a total noise reduction of up to 14dB. Therefore, the LSE technique can reduce the LF noise in a large MOS transistor by up to 8dB. This was confirmed with measurements on a 0.8µm process [3.21]. In [3.22], measurement results on minimum size transistors in a 0.18µm process were presented. As predicted by the McWhorter model, these devices exhibit a spread in steady-state LF noise magnitude of nearly two orders of magnitude. Moreover, a large spread in the effect of LSE was shown. While LSE did reduce the LF noise on average, there were some devices where the LF noise was actually increased. This result is of great importance for the application of LSE in CMOS imagers. Obviously, the application of LSE is not possible in circuits where the transistor has to be switched on continuously. However, there are several applications, such as ring oscillators and sampled data systems, where the transistors are switched off anyway. Inside the CMOS imager front-end, this is also the case, as will be shown in the next sub-section. While the application of LSE is very simple, the explanation of the phenomenon requires an in-depth study of the semiconductor physics involved. In [3.19], it was suggested that the noise reduction is caused by the cycling of the transistor between inversion and accumulation. When the transistor is in accumulation, the occupancy of the traps changes significantly, and this change reduces the initial noise when the transistor is switched on again. However, in [3.18] and [3.23], measurements are presented where LSE is performed by changing a transistor's source voltage, instead of its gate voltage. Therefore, the transistor is not switched between inversion and accumulation, as accumulation requires the gate voltage to be low compared to the substrate voltage. These results are crucial for the application of LSE in CMOS imager front-ends, as will be shown in the next section. In [3.18], a more sophisticated model is presented that explains the LF noise behavior under LSE. It is based on the classical Shockley-Read-Hall model [3.24], which describes trapping and detrapping of holes and electrons. The assumptions made to construct the model are supported by experimental results, and the LF noise magnitudes predicted by the model correspond well to measurements. However, it should be noted that the model does require parameters of the traps inside a MOSFET, which are dependent on process technology. These parameters can be acquired by measurement results in the particular process that is to be used, as to enable quantitative predictions for the effectiveness of the LF noise reduction. To this end, sub-section 3.4.2 presents measurement results that provide quantitative predictions of the effectiveness of applying LSE inside a CMOS imager readout front-end. #### 3.3.3 Application of LSE inside a CMOS Imager Front-End As indicated in the last sub-section, there are two requirements for applying LSE in an analog circuit. First of all, the circuit and its application should allow the transistor to be switched off for part of the time. In CMOS imager front-ends, this is not a problem. Here, the MOS transistor that contributes the performance-limiting LF noise is the in-pixel source-follower, which is not used while the pixel is integrating photocurrent. Second, it should be possible to either lower the gate voltage or increase the source voltage in order to reduce the LF noise. In the CMOS imager front-end, manipulating the gate is difficult since it is directly connected to the photosensitive element. Any large-signal excitation directly at the sensor node would cause large errors on the signal. However, the source of the source follower can be easily accessed, as it is connected to the column bus via the row select transistor. Applying LSE via the source of the source follower results in the circuit diagram depicted in Figure 3-8a [3.25]. As can be seen in the figure, no additional in-pixel circuitry is required. The only change to the front-end circuit is the addition of switch S3 inside the column circuit. Apart from this, some changes in the timing of the front-end circuit have to be made, as outlined in Figure 3-8b. Before reading out the pixel, the source of source follower M2 is connected to the supply voltage VDD via the column bus. To this end, S3 connects the column bus to VDD while the in-pixel row-select switch (M3) is closed. This creates a suitable 'off' state for source follower and should therefore lower the LF noise. After this off-period, switch S3 is opened, and the pixel can be read out in normal fashion, as explained in sub-section 2.3.2. While the concept of LSE application in the CMOS imager is quite simple, at the time this concept was first developed some essential questions remained unanswered: • Is the application of LSE via the source of a MOS transistor just as effective as via the gate? At the time LSE in CMOS imagers was first considered, all existing publications only provided measurements results on the effect of LSE at the gate of the device. Figure 3-8: a) implementation of LSE in a 4T CMOS imager front-end b) corresponding timing diagram Even though CDS does not properly correct for LF noise in the CMOS imager front-end, it is obvious that its application is still essential to correct for offset and reset noise. Therefore, how does a combination of CDS and LSE perform? Most published LSE measurement results were done at low frequencies. Such low-frequency LF noise would be corrected by CDS without LSE as well. How large would the spread in LF noise from pixel to pixel be in a modern CMOS process used for imaging? Few measurement results existed at the time. A related question would be whether an average improvement of all pixel front-ends at the cost of a deterioration of some pixels would be acceptable. To answer the above questions, relevant measurement results were needed. While the immediate implementation of LSE inside an imager according to Figure 3-8 might have seemed straightforward, this was not done, since it is very difficult to distinguish LF noise from other noise sources in an imager. To prevent this problem, a custom measurement IC was built that was specifically designed to measure LF noise while applying LSE. In the next section, this IC and the measurement results acquired with it will be described. # 3.4 LF Noise Measurements under Large Signal Excitation #### 3.4.1 Measurement IC In order to evaluate the noise reduction that Large-Signal Excitation can achieve inside a CMOS imager front-end, a custom measurement IC was realized in cooperation with the University of Twente [3.25]. The goal of this measurement IC was to enable separate measurements of LF noise while applying LSE and correlated-double sampling. The main design challenge was therefore to design a circuit such that it is sensitive to LF noise, but insensitive to other noise sources. In Figure 3-9a, a simplified circuit diagram of the measurement IC is depicted. While an imager has a single-ended signal path, the measurement IC uses a fully differential signal path to decrease the sensitivity of the measurement circuit to ambient noise sources. Therefore, two transistors M1 and M2 are used of which the LF noise is measured. The transistors are biased to operate as source followers via current sources $I_1$ and $I_2$ . The noise of the transistors is read-out via switches S3 using a differential amplifier A1. Since the LF noise of M1 and M2 is uncorrelated, the amplifier will read out a LF noise voltage equal to $\sqrt{2}$ times the LF noise voltage of a single transistor. Since $A_1$ has a gain of Figure 3-9: a) Simplified circuit diagram of the LF noise measurement IC b) Corresponding timing diagram 100x, the LF noise is amplified, facilitating further off-chip processing and measurement. As explained in the previous section, the application of CDS is essential in CMOS imagers; therefore, a combined measurement using both CDS and LSE should be performed to predict the effect of LSE in a CMOS imager. To this end, CDS can be performed off-chip in the digital domain, which allows for a comparison between the effect of LSE with and without CDS. The timing diagram showing the application of LSE in the measurement circuit is shown in Figure 3-9b. Transistors M1 and M2 are switched off by connecting their sources to a high voltage $V_{switch}$ via switches S1. This pulls the sources of transistors M1 and M2 to a high voltage, and therefore should reduce the LF noise once the transistors are switched on. Since this switching operation is common-mode, any residual transients at input of the amplifier are attenuated by its common-mode rejection ratio, which is another advantage of a differential signal path. Switches S2 are added to speed up the readout speed of amplifier A1: when the amplifier is not connected to transistors M1 and M2, it is connected to a common-mode voltage $V_{cm}$ . This ensures that the input voltage of amplifier A1 remains inside its common-mode range. Finally, the LF noise can be readout using amplifier A1 by closing switches S3. The bias voltages $V_{switch}$ , $V_{ddut}$ , $V_{gdut}$ are connected off-chip to allow for flexibility in the bias voltages during measurements. Similarly, bias current sources $I_1$ and $I_2$ can be controlled from the outside to allow for measurements at different currents. The measurement IC was realized in Figure 3-10: Micrograph of the measurement IC an industrial 0.35 µm process. In the implemented circuit, an analog multiplexer (not shown in the figure for clarity) is added that allows different transistors to be tested. As a result, six different transistors could be measured, with device sizes ranging from a W/L of 0.5 µm/0.35 µm to 5 µm/0.35 µm. A micrograph of the realized measurement IC is depicted in Figure 3-10. #### 3.4.2 Measurement Results Using the IC described in the last sub-section, LF noise measurements under LSE conditions were performed at Twente University [3.18][3.25]. For these measurements, the biasing conditions of the CMOS imager environment were replicated as much as possible. In Figure 3-11, a scatter plot is depicted that compares LF noise measurements in steady-state conditions with measurements of LF noise while applying LSE at a frequency of 100Hz. The transistor is biased at 10µA; when applying LSE, the noise of the transistor is measured 0.5µs after turn-on. In these measurements, no CDS was applied. Each dot in the figure represents measurements on one transistor; in total, 41 transistors were measured. On the horizontal axis, the steady-state LF noise is plotted, while the LF noise under LSE is plotted on the vertical axis. Therefore, if the steady-state noise equals the noise under LSE in a transistor, the corresponding dot in Figure 3-11: Scatter plot of the measurement results comparing LF noise under steady-state conditions and when LSE is applied. No CDS is applied the scatter plot would lie on the line y=x (the dotted line in the figure). As can be seen in the figure, this is not the case; most dots are below the line y=x, indicating that the noise under LSE is lower than the steady-state noise. Also, it is obvious that there is a large spread of about two orders of magnitude between the 'noisiest' and the 'quietest' transistor. Finally, while most transistors have a lower noise under LSE, some transistors actually exhibit a *higher* noise when applying LSE (dots above the line y=x). On average, the decrease in LF noise is 1.4dB. As explained in the previous sub-section, the application of CDS is essential in the CMOS imager front-end to reduce reset noise and offsets. Therefore, the same transistors were measured while applying CDS and LSE concurrently according to the timing diagram of Figure 3-8b: the transistors were switched off via the source before both CDS samples are taken. The CDS was performed off-chip in the digital domain; the first CDS sample is taken 0.5µs after turning on the transistor, the second sample is taken 3µs after turn-on. The measurement results are shown in Figure 3-12. As can be seen in the figure, most dots in the scatter plot are now located on or above the line y=x, indicating that the LF noise actually increases when applying LSE. This result is disappointing, as it indicates Figure 3-12: Scatter plot similar to Figure 3-11, but CDS was applied along with LSE in these measurements that applying LSE in an imager in a manner suggested in sub-section 3.3.3 would not lead to a reduction in LF noise. In [3.18], the measurement results of concurrent application of LSE and CDS are explained using the LF noise model proposed in the same work. It shows that the measurement results are in accordance to the model. The explanation can be intuitively summarized as follows: For the application of CDS it is essential that the LF noise in both samples is equal in order to be cancelled. With the application of LSE as indicated in Figure 3-8b, this is not the case. While the first sample is taken immediately after the transistor is switched off, the second sample is taken after the transistor has been switched on for a longer period. The resulting inequality in 'bias history' for both samples leads to a LF noise that is unequal between the samples, and is therefore not cancelled out properly by the CDS. #### 3.5 Conclusion The measurement results presented in the previous section lead to conclusions that are of importance to the remainder of this thesis. The following conclusions can be drawn about noise in the CMOS imager front-end readout circuit: - LF noise measurement results show that the application of LSE in a CMOS imager front-end, in a manner suggested in sub-section 3.3.3, does not lead to a decrease in LF noise of the front-end. The reason for this is the unequal 'bias-history' of the two CDS samples. A possible solution to this problem would be to switch off the source follower between the first and second sample as well. However, such a switch transient might lead to cross talk onto the floating diffusion, thereby corrupting the signal. - Even if the problem of LF increase due to the concurrent application of LSE and CDS can be solved, the LF noise measurements without application of CDS show only a modest improvement of 1.4dB on average. As was shown in Table 3-1, the LF noise in the imager front-end is the dominant noise source, estimated to be 350μV. The biggest noise source after LF noise is thermal noise, which is estimated to be 150μV. An average noise decrease of 1.4dB would mean that the LF noise is only reduced to 300μV, and therefore, LF noise would remain the dominant noise source. - Apart from the little average decrease of LF noise due to LSE, the large *increase* in LF noise due to the application of LSE of some transistors is also a concern. In Figure 3-11, two out of the 41 measured transistor show a LF noise increase in excess of a factor 10. In a CMOS imager, the large number of pixels will certainly lead to a significant number of pixels that have a similar or worse noise increase. Such very noisy pixels can lead to visible artifacts in an image, since the human visual system is very sensitive to individual pixels that 'stand out' in an area with uniform lighting (e.g. a picture with a white wall in the background, on which some pixels are darker than the rest). Based on the points mentioned above, it can be concluded that LF noise remains a major performance limiter in the CMOS imager front-end. While large signal excitation is an interesting technique worthy of further investigation, it does not lead to a significant LF noise decrease in CMOS imager front-ends. Because of this noise limitation, it does not seem possible to further improve the performance of the analog front-end readout circuit of a CMOS imager through the use of circuit or system-level techniques. Therefore, in the next chapters, the focus will be on the second goal of this thesis of increasing the power efficiency of the analog signal processing chain, and in particular, the A/D converter. #### 3.6 References - [3.1] B. Razavi, *Design of Analog CMOS Integrated Circuits*, Boston: McGraw Hill, 2001 - [3.2] D. A. Neamen, *Semiconductor Physics and Devices*, Chicago: Irwin, 1997 - [3.3] J. Janesick and J.T. Andrews, "Fundamental Performance Differences between CMOS and CCD Imagers", *Proceedings of SPIE*, vol 6276, June 2006 - [3.4] D. Sacket, "CMOS Pixel Device Physics", 2005 IEEE ISSCC Circuit Design Forum: Characterization of Solid-State Image Sensors, feb. 2005 - [3.5] A.J. Blanksby, M.J. Loinaz, "Performance analysis of a color CMOS photogate image sensor", *IEEE Transactions on Electron Devices*, vol. 47, pp. 55-64, January 2000 - [3.6] J.R. Janesick, "Scientific CCDs", *Optical Engineering*, vol. 26, pp. 692-714, Aug. 1987 - [3.7] S. Kogan, *Electronic Noise and Fluctuations in Solids*. Cambridge, UK: Cambridge Univ. Press, 1996 - [3.8] C. Enz, and G. Temes, "Circuit techniques for reducing the effects of op-amp imperfections: autozeroing, correlated double sampling, and chopper stabilization", *Proceedings of the IEEE*, vol. 84, pp. 1584-1614, November 1996 - [3.9] G. D. Wilk, R. M. Wallace, and J. M. Anthony, "High-κ gate dielectrics: Current status and materials properties considerations" *Journal of Applied Physics*, vol. 89, no. 10:5243–5275, May 2001 - [3.10] Keith Findlater et al., "SXGA Pinned Photodiode CMOS Image Sensor in 0.35µm Technology", *IEEE International Solid-State Circuits Conference*, vol. XLVI, pp. 218-219, Feb. 2003 - [3.11] A.L. McWhorter, 1/f noise and related surface effects in germanium, PhD dissertation, MIT, Cambridge, MA, 1955 - [3.12] F.N. Hooge, "1/f noise is no surface effect", Physica, vol 29A, no. 3, pp. 139-140, Apr. 1969 - [3.13] K.K. Hung, P.K. Ko, C. HU, and Y.C. Cheng, "A unified model for the flicker noise in metal-oxide-semiconductor field-effect transistors", *IEEE Transactions on Electron Devices*, vol. 37, no. 3, pp. 654-665, Mar. 1990 - [3.14] K.K. Hung, P.K. Ko, C. HU, and Y.C. Cheng, "A physics-based MOSFET noise model for circuit simulators", *IEEE Transactions on Electron Devices*, vol. 37, no. 5, pp. 1323-1333, May 1990 - [3.15] R. Brederlow, W. Weber, D. Schmitt-Landsiedel, and R. Thewes, "Fluctuations of the low frequency noise of MOS transistors and their modeling in analog and RF-circuits," *IEDM Technical Digest*, pp. 159–162,1999 - [3.16] Xinyang Wang, Padmakumar R. Rao, Adri Mierop, Albert J.P. Theuwissen "Random Telegraph Signal in CMOS Image Sensor Pixels", *IEDM Technical Digest*, pp 115-117, 2006. - [3.17] A.P. van der Wel, E.A.M. Klumperink and B. Nauta, "Effect of switched biasing on *1/f* noise and random telegraph signals in deep-submicron MOSFETs", *Electronics Letters*, vol. 37, no. 1, pp. 55-56, Jan. 4th 2001 - [3.18] A.P. van der Wel, MOSFET LF noise under large signal excitation –measurement, modelling and application, PhD thesis, University of Twente, Enschede, The Netherlands, 2005 (ISBN no. 90-365-2173-4) - [3.19] I. Bloom and Y. Nemirovsky, "*1/f* noise reduction of metal-oxide-semiconductor transistors by cycling from inversion to accumulation", *Applied Physics Letters*, vol. 58, no. 15, pp. 1664-1666, April 1991 - [3.20] S.L.J. Gierkink, E.A.M. Klumperink, A.P. van der Wel, G. Hoogzaad, A.J.M. van Tuijl and B. Nauta, "Intrinsic *I/f* device noise reduction and its effect on phase noise in CMOS ring oscillators", *IEEE Journal of Solid-State circuits*, vol. 34, no. 7, pp. 1022-1025, July 1999 - [3.21] E.A.M. Klumperink, S.L.J. Gierkink, A.P. van der Wel and B. Nauta, "Reducing MOSFET *I/f* noise and power consumption by switched biasing", *IEEE Journal of Solid-State Circuits*, vol. 35, no. 7, pp. 994-1001, July 2000 - [3.22] A.P. van der Wel, E.A.M. Klumperink, S.L.J. Gierkink, R.F. Wassenaar and H. Wallinga, "MOSFET *1/f* noise measurement under switched bias conditions", *IEEE Electron Device Letters*, vol. 21, no. 1, pp. 55-56, Jan. 2001 - [3.23] A.P. van der Wel, E. Klumperink, J. Kolhatkar, E. Hoekstra, M.F. Snoeij, C. Salm, H. Wallinga and B. Nauta, "Low Frequency Noise Phenomena in Switched MOSFETs", *IEEE Journal of Solid-State Circuits*, vol. 42, no. 3, pp. 540-550, March 2007 - [3.24] W. Shockley and W. T. Read, Jr. "Statistics of the recombinations of holes and electrons", *Physical Review*, Vol. 87, no. 5:835–842, September 1952. - [3.25] M.F. Snoeij, A.P. van der Wel, A.J.P. Theuwissen and J.H. Huijsing, "The Effect of Switched Biasing on *1/f* Noise in CMOS Imager Front-Ends", *IEEE Workshop on CCDs and Advanced Image Sensors*, pp. 68-71, Karuizawa, Japan, June 2005 # Column-Level Analog-to-Digital Conversion 4 Analog-to-digital conversion is one of the essential functions of any modern sensor interface circuit. This is because nearly all modern electronic devices perform data processing, transportation and/or data storage in the digital domain, since it is more reliable and more robust than in the analog domain. This digitization is taken so much for granted that consumers are nowadays told that electronic devices *are* digital, e.g. the term "digital camera". However, this classification is incorrect, as the output signal of an imager is still an analog signal. In the context of A/D conversion, a CMOS imager has two main properties that differentiate it from other sensors. Firstly, it consists of a large array of light-sensitive pixels, which allows for a parallelized analog-to-digital conversion. Secondly, due to this large sensor array, the total data rate is much higher (> 1MSPS) than most other sensors. These two properties have a profound impact on A/D converter design. As explained in chapter 2, most of the early CMOS imagers contain only a single, chip-level ADC. However, it is possible to use a large number of parallel A/D converter channels, leading to column-level or even pixel-level ADCs. This will be discussed in section 4.1. It will be shown that for most mainstream imagers with a high pixel count (>3Megapixel), the column-level ADC is preferable, as it provides a good compromise between chip area and power consumption. In section 4.2, the architectures suitable for use as a column-level ADC will be discussed. The often-used column-level single-slope architecture will be described, and a new architecture will be introduced: the multiple-ramp single-slope (MRSS) ADC. In section 4.3, it will be shown that the presence of photon shot noise in imaging signals can be advantageously used to increase the speed and/or reduce the power consumption of the ADC. The implementation of this technique in column-level single-slope or MRSS ADCs will be described. Finally, in section 4.4, a circuit technique will be introduced that reduces the perceptual effect of column FPN in CMOS imagers. This can facilitate the application of column-level ADCs considerably, as their main drawback is the potential for column non-uniformities. ## 4.1 Why Column-Level A/D Conversion? # 4.1.1 Chip-Level, Column-Level and Pixel-Level A/D Conversion In chapter 2, an overview of a conventional CMOS image sensor was given. This section will give a more detailed overview of the system-level design of the analog signal processing chain of CMOS imagers. As will be shown, the main choices involve the location of the A/D converter in the readout chain, and, related to this, to what extent (if any) the A/D conversion should be performed in parallel. Figure 4-1 depicts a block diagram of a CMOS imager equipped with a single, chip-level, ADC. As explained in chapter 2, this conventional readout structure uses 3 stages of analog signal processing: in-pixel amplification, biasing and analog storage in the column circuits, and chip-level amplification and A/D conversion. Most of the early camera-on-a-chip products made in CMOS [4.1-4.2] were equipped with chip-level ADCs. One reason for this approach might be that they evolved naturally out of the first CMOS imager prototypes that had only a single, serial analog output [4.3-4.5]. However, the main reason is probably the relative simplicity of the architecture. One of the problems in the design of the analog readout circuitry is the perceptual effect of non-uniformities between pixels or columns of the Figure 4-1: Block diagram of a CMOS imager equipped with a single chip-level ADC image, as already illustrated in Figure 3-3 in chapter 3. This implies that any analog functionality implemented in a pixel or column circuit should be designed such that uniformity between pixels or column circuits is ensured. In the ADC architecture of Figure 4-1, the more complex analog blocks, i.e. the CDS (correlated double-sampling) amplifier and A/D converter, are implemented at the chip-level. As a result, the uniformity of these analog functions is automatically ensured. The chip-level ADC architecture has two potential drawbacks. Firstly, the chip-level CDS amplifier and ADC have to operate at a high speed to be able to readout the complete imaging array. As the resolution of CMOS imagers increases into the megapixel range, such circuits must operate at speeds in the hundreds of megasamples/second. As will be explained later, this high speed can have a negative effect on power consumption. Secondly, this read-out approach has a longer analog signal chain compared to the column-level or pixel-level A/D conversion approach. Since the gain in each of the analog circuits is typically limited to one, each sub-circuit will significantly contribute to the overall noise of the analog signal chain. Therefore, a shorter analog signal path might well improve the noise performance of an imager. In order to cope with the increasing readout bandwidth, the A/D conversion function can be moved into the column circuits, resulting in a Figure 4-2: Block diagram of a CMOS imager equipped with column-level ADCs readout architecture with a column-level ADC [4.6-4.14]. This is illustrated in Figure 4-2. As can be seen in the figure, both correlated double-sampling and A/D conversion are now performed at the column level. As a result, a full row of pixel outputs can be digitized concurrently. The results of this parallel A/D conversion are stored in a digital column memory. The integration of the A/D conversion at the column level results in several hundreds to a few thousands of parallel ADCs. This can drastically increase the total readout bandwidth, even though each column-level ADC will usually be much slower than a chip-level ADC. Furthermore, since the CDS operation can typically be combined with the column A/D converter itself, the analog signal processing chain is shorter than that of an imager with a chip-level ADC. Finally, while the analog signal path is parallelized at the column level, in practice some supporting ADC circuitry can still be implemented centrally and shared among the column ADCs. [4.8]. This can decrease power consumption and make it easier to ensure uniformity between the column ADCs. There are two main drawbacks of column-level ADCs. Firstly, column-to-column non-uniformities can become a serious design issue, as most of the analog functionality is moved into the column. As explained in sub-section 3.1.3, such column uniformities can severely degrade the perceptual image quality, since the human visual system is very sensitive to column artifacts. Secondly, because of the parallelization, imagers with column-level ADCs will require more chip area than imagers with a chip-level ADC, which can increase costs. The most radical way to parallelize and shorten the analog signal processing chain is to implement an ADC in each pixel [4.20-4.24]. This architecture, sometimes called a Digital Pixel Sensor (DPS) is illustrated in Figure 4-3. Because of the full parallelization of the A/D conversion function, imagers with pixel-level ADCs can achieve very high readout speeds. For instance, in [4.23] an imager is presented that achieves a continuous frame rate of 10000 frames/s. Although the parallelization of all analog functions can lead to non-uniformities, this is less problematic than in column-level ADCs, since pixel-level non-uniformities are less visible to the human eye. The main drawback of pixel-level A/D conversion is obvious: implementing all analog functionality in each pixel requires a lot of in-pixel circuitry. Moreover, a digital pixel sensor also requires much Figure 4-3: Block diagram of a CMOS imager equipped with pixel-level ADCs more wiring inside the pixel array to transport the digital output signals. This increase in circuitry and wiring leads to an increased pixel size and a lower fill factor. For instance, the imager in [4.23] uses 37 transistors and 16 wires per pixel, which has a size of 9.4µm x 9.4µm and a fill factor of only 15%. This higher pixel size leads to an increase in both chip size and the size of the optics required in front of the imager. #### 4.1.2 Architectural Comparison In this sub-section, a qualitative comparison between the readout approaches of the previous sub-section will be made to determine which of these approaches is preferable. While it would be preferable to make an exact comparison based on quantitative data, such as the power consumption of the readout architecture, this is very difficult for two reasons. Firstly, the choice of the readout architecture depends on the CMOS imager resolution and application. As mentioned in the previous sub-section, CMOS imagers with pixel-level ADCs are ideally suited for high-speed imaging, but unattractive for low-cost image sensors used in portable applications. Secondly, since the speed of the individual ADC channels is very different in the three possible readout architectures, it can be expected that different ADC topologies should be used for each readout architecture. This complicates exact comparisons in power consumption and chip area, unless actual ADC designs are made with the same system specifications. Table 4-1. Comparison of analog signal processing chains for a 5 Megapixel imager with a frame rate of 30Hz. Each signal processing chain is estimated to consume 100mW. | | chip-level<br>ADC<br>architecture | column-level<br>ADC<br>architecture | pixel-level<br>ADC<br>architecture | | | |----------------|-----------------------------------|-------------------------------------|------------------------------------|--|--| | number of ADCs | 1 | 2592 | | | | | speed per ADC | 150MSPS | 58kSPS | 30SPS | | | | power per ADC | 100mW | 39μW | 20nW | | | | total power | 100mW | 100mW | 100mW | | | In order to simplify the comparison, an imager with a resolution of 2592x1944 pixels (5 Megapixel) that operates at 30 frames/s will be considered as a typical design example. These numbers correspond well with the specifications of imagers for mainstream mobile applications, which form the largest market for CMOS imagers. To get an impression of the power consumption of each ADC, the design target for the overall ADC power consumption can be estimated at 100mW. From this estimation, the required speed and power consumption of each ADC can be calculated, as is done in Table 4-1. In a pixel-level ADC architecture, each ADC should operate at 30 samples/second while consuming no more than 20nW. The latter is obviously not realistic in a standard CMOS process, where leakage current can easily be tens of nano-amperes. Moreover, the pixel-level ADC approach requires a lot of in-pixel circuitry. As a result, the smallest pixel with built-in ADC reported in literature [4.24] has a pixel size of 7µm x 7µm, which is 16x larger in area compared with the state-of-the-art for a normal pixel [4.19]. For a 5Megapixel imaging array, this would result in a total pixel array size of 18.1mm x 13.6mm. While the minimization of chip area is not a primary goal in this thesis, it is obvious that such a large pixel array is not feasible in mainstream applications, as it would increase the costs very significantly. Therefore, it can be concluded that pixel-level ADCs are not feasible for mainstream imaging. The remaining comparison between the chip-level and column-level ADC architectures is more difficult to make. For this system-level comparison, the power efficiency, i.e. the power consumption required for a certain noise performance and speed, is decisive. As far as noise is concerned, the column-level architecture has the advantage of a shorter signal path. Since the gain in the readout circuits is usually close to unity, each additional circuit in the signal processing chain can add a significant amount of noise. A shorter signal path is therefore preferable for noise. Another advantage of a column-level architecture is the lower readout speed in each ADC. This reduces the noise bandwidth of the circuits, and can therefore reduce the total amount of noise. A first approach to compare the column-level to chip-level architectures is to compare imagers available in literature. To this end, a figure-of-merit (FOM) for ADCs can be used to express power efficiency of ADCs. An often-used definition for a FOM for ADCs is: $$FOM = \frac{power}{f_s \cdot 2^{ENOB}} \tag{4-1}$$ where $f_s$ is the sampling frequency of the ADC and ENOB is the effective number of bits. A practical complication in case of an ADC used as an integrated part of CMOS imagers is that detailed specifications for power consumption and sampling frequency are usually not provided in publications. Therefore, the total power consumption of the imager will be used instead, and the sampling frequency will be estimated by multiplying the pixel count of the imager with the reported frame-rate. Using this approach, 7 CMOS imagers that were published between 2003 and 2007 were compared, as is detailed in Table 4-2. | paper<br>ref. | architecture | pixel<br>count | frame<br>rate<br>(fps) | ADC res. (bits) | total<br>power<br>(mW) | FOM (pJ/conv) | |---------------|--------------|----------------|------------------------|-----------------|------------------------|---------------| | [4.12] | column | 1.3M | 30 | 11 | 75 | 0.931 | | [4.15] | column | 8.3M | 60 | 10 | 760 | 1.49 | | [4.16] | chip | 2M | 90 | 12 | 650 | 0.85 | | [4.14] | column | 2.8M | 60 | 12 | 580 | 0.85 | | [4.17] | column | 6.4M | 60 | 10 | 360 | 0.92 | 15 11 12 11 215 400 1.03 2.22 Table 4-2. Literature comparison of the power efficiency of imager ADCs Although the majority of imagers published in recent years has a column-level ADC, the power efficiency comparison is not conclusive. In particular, in [4.16], an imager with chip-level ADC is presented that has a power efficiency that equals the best column-level ADC shown in [4.14]. Moreover, since all power efficiency figures in this comparison are based on the total power consumption of the imagers, rather than the ADC power consumption, the computed numbers are probably inaccurate. Another approach is to perform an analytical comparison between the chip-level and the column-level ADC architecture. The main difficulty here is that it is difficult to estimate power consumption analytically, since it is strongly related to implementation details. Therefore, an accurate comparison would require measurements or at least simulations of two complete ADC implementations. However, a general analog design consideration can be made. Regardless of implementation details, the [4.18] [4.19] column chip 3.4M 8.0M realization of any analog function needs a certain amount of signal gain, and therefore, active elements in the form of MOS transistors. The transistor parameter that best defines its potential gain is its transconductance. The 'cost' for this transconductance is a bias current, and therefore power consumption. Therefore, an appropriate parameter for the power efficiency of a transistor is its $g_m/I_D$ ratio. MOS transistors are most power efficient when they are operating in weak inversion. In this operating region, the $g_m/I_D$ ratio is the highest, typically between 20 and 25 (V<sup>-1</sup>). In order for the transistor to operate in this region, the current density in the channel should be low. For a given transconductance, this means that the transistor channel should be large enough to keep the current density down. However, this means that the parasitic capacitance on the transistor nodes will be relatively large. This capacitance poses a fundamental speed limit for operation in weak inversion. To evaluate this fundamental speed limit, a number of simulations were performed in a $0.18\mu m$ CMOS process using the circuit depicted in Figure 4-4. It contains a simple PMOS source follower with an output capacitive load formed by an identical transistor. While PMOS transistors are usually slower than NMOS transistors, their lower 1/f noise makes them more suited for low-noise analog gain stages. In the simulations, the bias current was fixed at $10\mu A$ , the transistor length was fixed at $0.3\mu m$ , and the transistor width was varied. These parameters represent a situation where a certain power budget is available, which fixes the bias current. For this fixed current, a designer would then try to optimize the other performance parameters, being the bandwidth and $g_m$ of the circuit. The results of these simulations is shown in Figure 4-5, where the -3dB frequency of the circuit was plotted versus the transistor width W Figure 4-4: Simulation circuit to assess the speed limitation of transistor operation in weak inversion Figure 4-5: Graph showing the transistor width W and achieved power efficiency $g_m/I_D$ versus frequency for the circuit of Figure 4-4 and the $g_m/I_D$ ratio. As explained, the highest $g_m/I_D$ ratio is attained for a large transistor width, but this also results in a large parasitic capacitance, and therefore a low bandwidth. If, on the other hand, a large bandwidth is needed, a designer is forced to reduce the transistor width, which reduces the parasitic capacitance, but also lowers the $g_m/I_D$ ratio. In a chip-level architecture, the required ADC speed for our target imager is 150MSPS. Since the sampled nature of the imager readout path necessitates a switched-capacitor approach, the speed of the amplifiers in the readout circuit should be at least 5 times the data rate or 750MHz. Based on the simulation results of Figure 4-5, the $g_m/I_D$ ratio can be expected to be about 15 (V<sup>-1</sup>). The latter figure is probably an upper bound, since the capacitive load of the circuitry can be expected to be higher in practice than the simple circuit of Figure 4-4. This higher capacitive load would require a further reduction in transistor width, and this would lead to a further reduction in the $g_m/I_D$ ratio. In contrast, the required circuit bandwidth in a column-level ADC is only several hundred kHz. Therefore, it should be easily possible to design circuitry for a column-level ADC that operates in weak inversion. Based on the simple circuit model of Figure 4-4, the conclusion can therefore be made that in high-resolution imagers, the column-level architecture should offer a better power/speed ratio than a chip-level ADC architecture. The lower speed required of each column-level ADC allows the transistors in the column-circuits to operate in weak inversion, where they are most power efficient. Whether this theoretical advantage of a higher transistor $g_m/I_D$ ratio can really be translated into a more power efficient ADC design obviously depends on how well the ADC architecture can translate transistor performance into system performance. This challenge will be discussed in the next section, where a new, more power efficient ADC architecture will be introduced. #### 4.2 Column-Level ADC Architectures In the previous section, it was estimated that an analog signal chain using parallel, column-level, ADCs should be the most power-efficient readout structure for high-resolution imagers. The next problem is to design a power-efficient ADC architecture that is suitable for implementation in the column. In this section, such architectures will be discussed. Firstly, the requirements for such an ADC will be discussed in sub-section 4.2.1. These requirements result in a number of suitable architectures. The most often used architecture, the column-level single-slope ADC, is discussed in sub-section 4.2.2. While this architecture has many advantages, it has an important drawback of having low A/D conversion speed. In sub-section 4.2.3, a new architecture, the multiple-ramp single-slope ADC, is introduced that provides a significantly faster conversion speed, while preserving the key benefits of the single-slope ADC. ## 4.2.1 Column-Level ADC Architecture Requirements The requirements for column-level ADC architectures can be split up into two categories. Firstly, there are a number of general requirements similar to any other ADC, such as speed and resolution. Secondly, there are a number of requirements specific to the implementation of this ADC architecture in the column of an imager. In this sub-section, the two categories of requirements will be discussed. The general requirements for the column-level ADC are relatively modest. As already outlined in Table 4-1, the required conversion rate for a column ADC is only around 60kSPS. While this number depends on the pixel count of the imager, the required conversion speed only increases proportionally to the square root of the pixel count due to the column-parallel readout. Therefore, based on conversion rate alone, nearly all known ADC architectures would be feasible for use as a column-level ADC. In chapter 3, it was shown that the dynamic range of a CMOS imager is determined by the maximum voltage swing and noise of the front-end circuit. In typical CMOS imagers, this dynamic range is about 60-70dB depending on the photosensitive element used. Therefore, the ADC resolution typically required for a CMOS imager is around 10-12 bits. Furthermore, from the discussion of the front-end linearity (sub-section 3.1.3) it can be concluded that the integral non-linearity (INL) performance of the A/D converter is not critical, as long as it is better than the non-linearity of the front-end circuit (typically about 1%). On the other hand, the differential non-linearity (DNL) of the converter is generally considered important, as DNL errors can be highly visible to the human eye. Therefore, monotonicity of the converter is of importance. There are three specific requirements related to the implementation of the ADC as a column-level massively-parallel readout structure. First of all, the uniformity between the column ADCs is very important for the perceptual image quality. As explained in sub-section 3.1.3, any column-level non-uniformities lead to vertical stripes that are highly visible to the human eye. Because of this, the column-to-column non-uniformity should be less than 0.1% in order to obtain an acceptable image quality. For the column-level ADC, this means that the maximum offset and gain mismatch should be less than 0.1% of full scale. A second important requirement is that the column ADC implementation should require little chip area. The column-level readout circuit should fit underneath each pixel column, and should have the same width as the pixel. For a high-resolution imager the pixel pitch approaches 2µm, which makes it very challenging to layout the column circuit at the same width. By placing column-level circuits both above and below the pixel array and connecting half of the pixel columns to the upper and the other half to the lower column circuits, the layout width can be increased by a factor two. Nonetheless, implementing the ADC in such a narrow area is challenging, and therefore, simplicity of the column circuit is paramount. The third requirement for a column-level ADC is a low power consumption. While this requirement itself is not specific to column-level ADCs, it does favor specific design choices in column-level architectures. Since a large amount of parallel ADCs are involved, it is very favorable for power consumption if some of the circuitry needed for these ADCs can be *shared*, and therefore implemented only once. This not only reduces power consumption, but also reduces uniformity problems. The column-parallel single-slope ADC and the new multiple-ramp single-slope (MRSS) ADC that will be described in the next two section both have such a partly centralized implementation. Based on the above-described requirements, the following ADC architectures have been used in literature: - Cyclic ADC [4.7][4.13] - Successive-approximation ADC [4.6][4.10] - Single-slope ADC [4.8][4.9][4.11][4.12][4.14] Of these three, the single-slope ADC is by far the most popular. The reason for this is that the single-slope ADC fits the requirements for a column-level ADC very well. Compared to the other two architectures, it requires less in-column analog circuitry. A column-parallel single-slope ADC only requires a comparator in each column, compared to a comparator plus DAC for a successive approximation ADC, or an amplifier, comparator and sample-and-hold circuit for a cyclic ADC. This simple column circuit reduces the required chip area and makes it relatively easy to ensure column-to-column uniformity. Finally, a part of the single-slope ADC, the ramp generator, can be implemented centrally and shared between the ADCs, thus lowering power consumption. Because of these advantages, the column-parallel single-slope ADC architecture will be described in detail in the next sub-section ## 4.2.2 Column-Parallel Single-Slope ADC Architecture In Figure 4-6a, a block diagram of the column-parallel single slope ADC architecture is shown. As can be seen from the figure, some circuit blocks, i.e. the ramp generator and digital counter, are implemented centrally. This central implementation is a key advantage of the Figure 4-6: a) Block diagram of a column-parallel single-slope ADC b) corresponding timing diagram column-parallel single-slope ADC, as it reduces the required circuitry in each column to the minimum. As can be seen in the figure, the only analog circuit that is required in each column is a comparator. Note that in the figure, only two column circuits are shown; in reality, the number of column circuits equals the number of columns in the pixel array, and is therefore typically several hundreds to a few thousands. The operation of the ADC is further illustrated with the timing diagram of Figure 4-6b. The ramp generator in the central circuit block outputs a ramp voltage $V_{ramp}$ that spans the entire input voltage range of the ADC. A digital n-bit counter runs synchronously with the ramp generator. In each column, the centrally generated ramp voltage is compared to the column input voltage. When the comparator detects that the input voltage equals $V_{ramp}$ , its output triggers the digital memory in the column, which stores a digital number output by the central digital counter. As this counter runs synchronously with the ramp generator, the digital number stored in the column memory is proportional to the voltage at the input of the column comparator. The simple column circuit does not only reduce chip area, but compared to other ADC architectures also makes it is easier to ensure uniformity. Any errors in the ramp generator affect all columns equally and therefore cannot cause any non-uniformity. The main in-column error sources are offset and delay of the comparator. These can be corrected with dynamic offset cancellation techniques. In chapter 5, a CMOS imager with a low-power column-parallel single-slope ADC will be described. There, the design of an in-column comparator will be discussed in detail. The main disadvantage of the column-parallel single-slope ADC is its low conversion speed. The required A/D conversion time $T_{conv}$ can be written as: $$T_{conv} = \frac{2^n - 1}{f_{ck}} \tag{4-2}$$ where n is the desired resolution of the ADC and $f_{ck}$ is the clock frequency of the counter. As a resolution of 10-12 bits is usually required in an imager, this means that the total conversion time is 1023-4095 clock periods. In comparison, both the successive approximation and cyclic ADC architectures only require n clock periods for an n-bit conversion, but this much faster conversion comes at the expense of a much more complicated analog column circuit. In the next sub-section, a new column-parallel ADC architecture is introduced that offers a significantly lower conversion time then the single-slope architecture, while preserving the key benefit of the simple column circuit. ## 4.2.3 Multiple-Ramp Single-Slope ADC Architecture The multiple-ramp single-slope ADC architecture has been proposed [4.25][4.26] to overcome the main disadvantage of the single-slope ADC, which is its slow conversion speed. The key insight leading to this new Figure 4-7: Common principle of the single-slope and successive approximation ADC architecture architecture is a comparison between the existing single-slope and successive approximation architecture. They bear a great resemblance, as illustrated in Figure 4-7. In both cases, a number of comparisons are made between a dynamic reference signal and the analog input voltage, and a digital output proportional to the analog input voltage is generated based on the outcome of the comparisons. In case of a single-slope ADC, the dynamic reference generator outputs a ramp voltage. While this approach is simple and robust, it requires $2^n$ comparisons for an *n*-bit conversion, and is therefore slow. The successive approximation ADC requires only ncomparisons, by using a dynamic reference generator whose output depends on the result of previous comparisons. The drawback of this approach for a column-parallel ADC is that it requires *feedback* between the comparator and dynamic reference generator, and therefore, the dynamic reference voltage becomes dependent on the input signal. In a column-parallel structure, where there are several hundreds of comparators, this necessitates the implementation of a reference generator in each column, instead of a single, centrally implemented dynamic reference generator as is done in a column-parallel single-slope ADC. The multiple-ramp single-slope (MRSS) architecture offers a compromise between the successive approximation and single-slope architecture. It has a faster conversion speed than the single-slope ADC, but without requiring a reference generator in each column. The basic concept of the MRSS architecture is that the ramp voltage, which spans the entire input voltage range in a single-slope architecture, is divided into m sub-ramps that each span 1/m of the input range. If each column comparator is connected to the correct sub-ramp (i.e. the sub-ramp in which range the input signal is in), all m sub-ramps can be output concurrently, resulting in a shorter conversion time compared to a single-slope architecture. In Figure 4-8a, a block diagram of the multiple ramp single-slope ADC architecture is depicted. The dynamic reference generator outputs m different ramp voltages. Each column circuit has a set of switches that connects one of the m ramps to the input of the comparator. Compared to the single-slope architecture, the MRSS architecture only requires the addition of analog switches, as well as some extra digital memory and logic in each column. In Figure 4-8b, the operation of the MRSS architecture is further illustrated with a timing diagram. The A/D conversion is subdivided into a coarse and a fine phase. In the coarse phase, all comparators are connected to the coarse ramp generator. Using this ramp voltage, a coarse single-slope A/D conversion is performed that determines in which of the m intervals of the input range each input voltage falls. The results of this coarse conversion are stored in the memory in each column. Next, the coarse conversion result is fed back into the analog switches, which connect the correct sub-ramp to the comparator. Note that this is equivalent to the feedback present in a successive approximation architecture, but in this case, the dynamic reference generator is not dependent on the input signal. After the correct ramp is connected, the fine conversion phase is performed by outputting all m sub-ramps concurrently. This fine A/D conversion phase is again essentially a single-slope conversion, but since each comparator is connected to a sub-ramp corresponding to its input signals, the ramps only have to span 1/m times the ADC input range, and therefore, the conversion can be much faster. The result of the fine conversion is stored in the column memory. The final digital output is a combination of the coarse and fine conversion phase results. If the number of ramps m is equal to a power of two, i.e. $m = 2^p$ , then the total A/D conversion time $T_{conv}$ can be expressed as: $$T_{conv} = \frac{2^p - 1}{f_{ck}} + \frac{2^q - 1}{f_{ck}} \tag{4-3}$$ Figure 4-8: a) Block diagram of a multiple-ramp single-slope ADC b) Corresponding timing diagram where $f_{ck}$ the clock frequency of the counter, and p and q are integers, and p+q=n, and p is the resolution of the conversion. Therefore, the theoretical minimum conversion time occurs for p=q, which would result in a conversion time of 62 clock periods for a 10-bit resolution or 126 clock periods for a 12-bit resolution. However, such a choice for p and q would imply that 32 or 64 sub-ramps would be required, which would be difficult to implement. Nonetheless, it is clear from eq. (4-3) that the conversion time can be significantly shorter compared to a single-slope architecture. In chapter 6, an imager with a MRSS ADC is described that achieves a 3.3x reduction in conversion time compared to a single-slope ADC. Despite the advantages of the MRSS architecture of a higher conversion speed and a simple column circuit, there are disadvantages as well. Compared to the single-slope architecture, there are two additional problems. Firstly, if each of the m sub-ramps spans exactly 1/m of the input range, errors in the coarse conversion phase can result in the comparator being connected to the wrong sub-ramp for the fine conversion phase, resulting in dead bands in the digital outputs. This problem can be solved by creating some overlap between the different sub-ramps. Subsequently, some simple digital processing is necessary to correctly combine the digital outputs stored in the coarse and fine memory. Secondly, good matching of the sub-ramps is of critical importance to the performance of the ADC. The slopes of the sub-ramps should match very well, and their offsets with respect to one another should be well-defined. Apart from static matching, it is also important to consider dynamic effects. Since the number of comparators that is connected to each ramp can change dynamically, care must be taken that such dynamically changing load does not lead to mismatch between the ramps. However, since the multiple ramp generator only needs to be implemented once in the central circuit block, an increased complexity of this block is less of a concern, as the chip area is not as limited as for the column-level circuits. In chapter 6, the implementation of a CMOS imager with a column-parallel multiple-ramp single-slope ADC will be described. This implementation includes a precision multiple ramp generator of which the sub-ramps have intrinsically matched slopes, and of which the sub-ramp offsets are controlled through the use of an auto-calibration algorithm. # 4.3 Exploitation of Photon Shot Noise in Imager A/D Conversion Imager output signals have a unique property in that they contain a noise source that is dependent on the signal level: photon shot noise. In this sub-section, it will be shown that the presence of such a noise source in the signal can be exploited in the A/D converter to significantly reduce power consumption and/or conversion time. In sub-section 4.3.1, the principle, which is independent of the ADC architecture, will be explained. Subsequently, sub-section 4.3.2 and 4.3.3 will describe the application of this principle to single-slope and multiple-ramp single-slope architectures, respectively. #### 4.3.1 Principle of Photon Shot Noise Exploitation Figure 4-9 shows a conceptual diagram of the response to light of a CMOS imager front-end along with the noise sources present in its output. The sensor's output signal increases proportional to the amount of incident light, until a certain maximum input value is reached for which the output saturates. As discussed in section 3.2, there are a number of noise sources present in the front-end circuit output. Most of these Figure 4-9: Conceptual logarithmic plot of the sensor's response to light and corresponding noise sources sources, such as 1/f noise and thermal noise, are independent of the incident light and can therefore be plotted as a constant line, or a *noise floor*. However, if there is light incident on the sensor, the output signal also contains photon shot noise, which is proportional to the square root of the signal. Since the axes of Figure 4-9 have logarithmic scales, the photon shot noise can be plotted as a line with a slope of half the signal response. As can be seen in the figure, the constant noise sources are dominant at low light intensity, while the photon shot noise dominates for higher light intensity. When designing an A/D converter for a CMOS imager, the required resolution is usually based on the amount of constant noise (i.e. the noise floor) present in the front-end signal. Since these constant noise sources determine the dynamic range of the sensor, the ADC resolution is typically chosen such that the quantization noise is less than the constant front-end noise sources. While this maximizes the sensor's performance for low light inputs, it essentially means that the ADC over performs for higher light inputs. As the light intensity increases, the photon shot noise becomes dominant, while the quantization noise of the ADC stays constant. Therefore, the amount of quantization noise can be increased for high sensor inputs without compromising the quality of the output image, which is equivalent to reducing the ADC resolution for high inputs [4.27]. This, in turn, is equivalent to a reduction in the total number of quantization levels of the ADC, which can lead to a reduction in power consumption in some architectures. The resulting ADC essentially has a companding characteristic, similar to the logarithmic µ-law [4.28] or A-law [4.29] quantization schemes used in telephony applications. #### 4.3.2 Companding Quantization Calculation Method From a theoretical point of view, it would be best to exactly match the resolution decrease of the ADC to the photon shot noise increase, which results in a resolution decrease proportional to the square root of the input signal. However, there are a number of digital post-processing steps commonly performed in imagers, such as white-balancing and color interpolation, which require digitized sensor outputs that are linearly dependent on the input. If the quantization step of the imager ADC has a square-root dependency, it would be quite difficult to restore a linear dependency in the digital domain. Similarly, the logarithmic A-law/ $\mu$ -law quantization schemes as used in telephony are equally unsuited. Therefore, a more practical tracking of the photon shot noise should be Figure 4-10: Conceptual plot of photon shot noise exploitation in imager signals employed that allows for an easy digital reconstruction of the ADC output to a digital code that is linearly dependent to the sensor input. In this sub-section, two practical approaches are provided, along with a calculation method that can determine how many quantization steps are required for a given ratio between photon shot noise and quantization noise. Figure 4-10 displays the simplest way to create a companding quantization characteristic that allows for a simple digital reconstruction to a linear digital output [4.30]. The quantization step sizes are not continuously increased, but are doubled several times along with the increase of the input signal. As a result of this *binary* quantization step scheme, it is very simple to create a digital output that linearly depends on the analog input, since it only requires multiplications by a factor of 2. This concept can be refined by increasing the quantization steps with integer number instead of powers of two, i.e. the quantization step increases with 2, 3, 4, 5, etc. times [4.31]. This *integer* quantization step scheme offers a better matching of the quantization noise to the photon shot noise, and thus less required quantization steps, at the expense of a slightly more complicated digital reconstruction. There are a number of factors determining the allowable decrease in ADC quantization steps. Firstly, as was discussed in sub-section 3.2.1, the saturation charge of the sensor determines the maximum signal-to-noise ratio achievable by the pixel. While a larger amount of charge will increase the photon shot noise in absolute terms, the relative amount of noise actually decreases, since photon shot noise only increases with a square root of charge, while the signal output increases directly proportional to the charge (Eq. (2-2)). Therefore, the maximum signal-to-noise ratio increases with the saturation noise, and thus, the number of required quantization steps also increases with the saturation charge. Secondly, the initial resolution of the ADC for low signal levels is of importance, as it determines for which signal levels a quantization step increase can be allowed. Finally, the ratio between the maximum amount of quantization noise and the amount of photon shot noise is a design parameter. If more quantization noise is allowed, less quantization steps are required in the ADC. On the other hand, a higher ratio of quantization noise might become visible in the image, as there might not be enough photon shot noise to render the effects of quantization invisible. In this work, a quality factor r will be used to define the required ratio between quantization noise and photon shot noise<sup>1</sup>: $$r = \frac{\overline{e}_{qns}(k)}{\overline{e}_{phs}(N_{sig})} \tag{4-4}$$ where $\overline{e}_{qns}(k)$ is the amount of quantization noise, depending on the integer step size k and $\overline{e}_{phs}(N_{sig})$ is the amount of photon shot noise, depending on the amount of signal electrons $N_{sig}$ . Therefore r is the minimum factor that the quantization noise should be smaller than the photon shot noise. Using this quality factor, the required number of quantization steps can be derived, which is further detailed in appendix A. In Table 4-3, the results of two of such calculations are shown. 95 <sup>&</sup>lt;sup>1.</sup> Note that the quality parameter r is defined here as the minimum factor between photon shot noise and quantization *noise*. In [4.31], a similar parameter $M_{shot}$ is defined as the minimum factor between photon shot noise and the quantization *step*. This makes a difference of a square root of 12, or 10.8dB. | S sui | | | | | | _ | 11 / | | | | |-------|---------------|-----|-----|-----|-----|-----|------|-----|-----|--------| | | steps (LSBs): | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | total: | | | binary step | 204 | 305 | - | 611 | - | - | - | 105 | 1225 | | | integer step | 204 | 127 | 119 | 115 | 112 | 110 | 109 | 105 | 1001 | Table 4-3. Required quantization steps for the binary and integer step scheme, using $N_{sat}$ =25,000, n=12 bits and r=0.1 (see Appendix A) For these calculations, a sensor saturation charge of 25,000 electrons is assumed, the ADC resolution is chosen to be 12 bits (i.e. the initial resolution for small input signals) and the quality factor r is chosen to be 0.1. The latter is a conservative setting, ensuring that the quantization noise will not be visible in the image. In the table, results are shown for both the binary quantization step and the integer quantization step scheme. From this table, one important conclusion can immediately be drawn: the use of a companding quantization scheme to exploit the presence of photon shot noise can lead to a significant reduction in the amount of quantization steps required in an ADC. A normal 12-bits ADC has 4096 quantization levels; as can be seen in the table, this amount can be reduced by a factor of 3 to 4. In Figures 4-11 through 4-13, the effect of companding is further evaluated by plotting the amount of required quantization steps depending on the three mentioned parameters required to calculate this amount of quantization steps, which are the saturation charge, the initial ADC resolution, and the quality factor r. In all these graphs, one input variable is plotted on the x-axis, while the other two input variables are kept to the standard values used for the examples in Table 4-3 ( $N_{sat}$ =25,000 , n=12 bits and r=0.1). Figure 4-11 depicts a graph showing the dependency of the required number of quantization steps on the saturation charge of the sensor. As can be seen in the figure, the number of quantization steps drops with the saturation charge. This is not surprising, as the relative amount of photon shot noise increases for decreasing saturation charges. Since the saturation charge is related to pixel size, which in many sensor implementations is decreasing to allow for a higher pixel count, this means that the application of a companding quantization scheme can become increasingly useful in future imager designs. For instance, in [4.19], an imager with a $1.75\mu m$ x $1.75\mu m$ pixel is presented that has a saturation charge of only 7000 electrons. In Figure 4-12, a graph is depicted that shows the amount of quantization steps versus the (initial) ADC resolution. This plot shows two Figure 4-11: Graph showing the relation between the sensor's saturation charge and the required number of quantization steps Figure 4-12: Graph showing the relation between the ADC resolution and the required number of quantization steps interesting aspects of companding. First of all, it is clear from the figure that companding is not useful for an ADC resolution of less than 10 bits, Figure 4-13: Graph showing the relation between the quality factor r and the required number of quantization steps as it does not significantly decrease the required number of quantization steps. The reason for this is simple: a lower resolution implies that the quantization noise will be higher, thereby reducing the input signal range for which photon shot noise dominates. At the high resolution end, another property can be seen: the amount of required quantization steps stabilizes for a binary quantization step scheme, or even drops for an integer step scheme. The reason for this is that an increased initial ADC resolution leads to a quantization scheme that is better able to track the photon shot noise. By changing the quality factor r, the amount of required quantization steps is obviously changed. This is illustrated in Figure 4-13. The actual choice for r is difficult to make based on quantitative observations alone. Instead, the perceived image quality to the human eye in the particular application of the CMOS imager will play an important role. Such perceptual quality estimation is outside the scope of this thesis; nonetheless, it is obvious from Figure 4-13 that the use of a companding quantization scheme can lead to a strong reduction in the required amount of quantization steps, even for a conservative choice for r. The concept of a companding quantization scheme that was introduced in this sub-section can in principle be applied in any ADC architecture. However, there is only an advantage in applying companding if it is possible to translate the reduction in quantization steps in a reduction of power consumption and/or an increased speed of the ADC. In the next paragraphs, the implementation of a companding in a single-slope and a multiple-ramp single-slope ADC will be discussed. It will be shown that in both cases, the application of companding can lead to a significant increase in conversion rate, which can be translated in a reduction in power consumption. # 4.3.3 Application in Single-Slope ADCs As explained in the last sub-section, the presence of photon shot noise in imager signals enables the use of an ADC that has larger quantization steps for high input signals. The use of larger quantization steps, which results in a reduction in the total number of steps, should lead to lower power consumption or higher speed of the ADC. In a single-slope ADC, the size of a quantization step is determined by the increase in ramp voltage during a clock period of the counter that runs synchronously with the ramp. In order to change the quantization step, either the clock frequency of the counter or the slope of the ramp voltage can be changed. The first solution is not attractive, since it is linked to the speed of the comparators. A variable counter speed would therefore either lead to comparators that are not fast enough for some input signals, or comparators that are over performing for other input signals, which would mean that the comparators use more power than is strictly necessary. Therefore, a better solution is to change the slope of the ramp voltage. In Figure 4-14, the implementation of a companding quantization scheme by changing the slope of the ramp is illustrated. Along with the companding scheme, a timing diagram of a normal, linearly quantizing single-slope ADC is depicted to allow for a graphical comparison. To implement the companding quantization, the slope of the ramp voltage is increased in a number of steps, resulting in a ramp that is piece-wise linear. As discussed in the previous sub-section, the quantization should either be doubled several times, or it should be increased with an integer amount, in order to enable an easy digital reconstruction to a linear output code. Therefore, the slope of the ramp generator should be increased accordingly: either the slope is to be doubled, or increased with an integer amount. It is obvious from Figure 4-14 that the implementation of companding has the advantage of a much shorter A/D conversion time for Figure 4-14: *Timing diagram of a companding quantization* scheme in a single-slope ADC the same comparator speed. This higher speed can also be converted into a lower power consumption by lowering the clock frequency, which allows the column comparators to operate at a lower speed and therefore lower power. The implementation of a companding quantization scheme does not require any changes to the column-level circuit. The only analog circuit that needs to be changed is the ramp generator, which has to be able to deliver ramp voltages with varying slopes. Furthermore, some digital circuitry has to be added that can restore a linear digital code at the ADC output. Such circuitry requires little area, and can easily be added at the chip-level. The main potential drawback of implementing a companding scheme is in the transition between one slope of the ramp and the next. Firstly, care must be taken in the design of the ramp generator that such a transition is glitch-free. Secondly, the presence of comparator delay might cause errors. In a single-slope ADC with a normal, linear ramp, this delay leads to a constant voltage offset. However, when a piece-wise linear ramp is applied, the delay causes a changing voltage error that depends on the slope of the ramp. This can therefore result in non-linearities. Moreover, the comparator delay might also change with the slope of the ramp, introducing further errors. # 4.3.4 Application in MRSS ADCs: the Multiple-Ramp Multiple-Slope (MRMS) ADC The implementation of a companding quantization scheme in an MRSS ADC is similar to the implementation in a single-slope ADC. In an MRSS ADC, the quantization step is determined by the increase of each of the concurrent sub-ramp voltages during one counter clock period of the fine conversion phase. Therefore, like in the single-slope ADC, the quantization step can be adjusted by changing the slope of each of the sub-ramps. This is illustrated in Figure 4-15. During the fine conversion phase, the individual sub-ramp voltages have different slopes: the higher the ramp voltage, the higher the input voltage, and therefore, the higher the slope. This results in an ADC that not only has multiple ramps, but is also using multiple slopes. Therefore, the combination of an MRSS ADC Figure 4-15: *Timing diagram of a companding quantization scheme in an MRSS ADC* with companding will be called a *Multiple-Ramp Multiple-Slope (MRMS) ADC* in the remainder of the thesis. A disadvantage of applying companding in a single-slope ADC is that the slope of the ramp voltage needs to be increased during the A/D conversion, leading to potential glitches. In an MRMS ADC, there are several ramp voltages, and therefore, a companding quantization scheme can be implemented by assigning different slopes to the ramp voltages. As a result, the slope of the ramps does not need to be increased *during* the A/D conversion, which prevents potential problems of glitches and comparator delay at a transition point between one slope and the other. Since the ramps do not have the same slope in the fine conversion phase, the coarse ramp has to be adapted accordingly to ensure that each comparator is connected to the correct sub-ramp in the fine phase, as is illustrated in the figure. Although the assignment of a constant slope to each sub-ramp has the advantage of being more robust, it does somewhat limit the advantage of companding. If the quantization steps are sized on the quality factor of Eq. (4-4) alone, it results in a different number of quantization steps for each step size, as can be seen in Table 4-3. Therefore, if such a quantization scheme would be applied directly in an MRMS ADC, each sub-ramp should have a different number of quantization steps, which reduces the speed advantages from running the ramps concurrently. Therefore, the companding quantization scheme has to be slightly adjusted to fit unto the multiple-ramp architecture, which probably results in slightly more quantization steps than necessary. Furthermore, the application of companding can only shorten the conversion time in the fine phase, as the time required to perform the coarse A/D conversion is determined solely by the number of sub-ramps. Nonetheless, the application of companding can significantly shorten the A/D conversion time. For instance, in chapter 6, measurements of an 10-bit MRMS ADC are shown, where a 21% reduction in conversion time compared to an MRSS ADC is achieved. If a higher resolution MRMS ADC would be designed, the advantage compared to MRSS would be higher, since companding is more advantageous at higher resolution. Like in the single-slope architecture, such a speed advantage can be translated into a reduced power consumption by lowering the clock frequency of the system and redesigning the comparators for a lower speed and lower power. # 4.4 Reduction of Column FPN using Dynamic Column Switching (DCS) One of the potential drawbacks of column-level ADCs is the artifacts generation of visual caused by column-to-column non-uniformities. Such column Fixed-Pattern Noise (FPN) is highly visible to the human eye, and thereby strongly reduces the perceived image quality. In this section, a circuit technique is introduced that can reduce column FPN. This technique, called Dynamic Column Switching (DCS), focuses on reducing the perceptual effects of column non-uniformities. It can significantly reduce the required uniformity of the column circuit, thus facilitating the implementation of column-level ADCs in CMOS imagers. ### 4.4.1 Principle of Dynamic Column Switching (DCS) In sub-section 3.1.3, the visual effects of spatial non-uniformities were discussed. These non-uniformities can be regarded as noise in the spatial domain, and are commonly called Fixed-Pattern Noise (FPN). Subsequently, it was shown by means of the simulated image depicted in Figure 3-3 that column FPN creates visual artifacts that are much more visible than pixel FPN. This insight is the key behind the DCS technique introduced in this sub-section. In existing column-level ADC designs available in literature [4.6-4.14], uniformity is ensured by exclusively relying on circuit techniques such as auto-zeroing, which reduce the *magnitude* of column ADC non-uniformities. The dynamic column switching technique [4.32-4.33] described here, on the other hand, reduces the *perceptual* effect of column non-uniformities. This is done as follows: a switching matrix is placed between the pixel array and the column circuit, as depicted in Figure 4-16a. At the beginning of each line time, the state of the switching matrix is changed by a pseudo-random generator before the pixel array is read out by the column circuits. By inserting such a switching matrix at the input of the column circuits, each column ADC is used to read out not just one, but several columns of the imaging array, thus spreading the non-uniformity of the column circuit over several array columns. Since the switching matrix randomly changes the order in which the pixel outputs are sampled, some extra circuitry is necessary to restore Figure 4-16: a) Block diagram of a CMOS imager with DCS implementation (gray blocks) b) Partitioning of the switching matrix in unit switching cells the original pixel output order. This can be easily done on-chip in the digital domain. A crucial part of the design is the switching matrix between the imaging array and the column ADCs. It is impractical to design a switching matrix that can connect *any* column of the imaging array to *any* ADC channel, as this would require a very complex set of switches and wires. Therefore the switching array can be partitioned in smaller unit switching cells that can connect just n columns of the imaging array to n column ADC channels, all of them controlled by the same control lines, as illustrated in Figure 4-16b. The choice of n is a design trade-off: a larger switching cell leads to a better spread of the non-uniformities of the column ADC channels, at the expense of a higher switching cell complexity, which means more chip area. Since it is difficult to predict the perceptual effect of DCS analytically, the choice of n is mainly based on simulation results. Such simulations will be described in the next sub-section. Finally, it should be noted that the proposed technique has some limitations. Firstly, since the magnitude of the column ADC non-uniformities is not reduced, applying DCS effectively increases the pixel FPN, as column uniformities are transformed into pixel (-like) non-uniformities. This means that the initial column non-uniformities should be smaller then the expected pixel FPN. Secondly, the required division of the switching array in small unit switching cells renders the technique ineffective for non-uniformities that are strongly correlated between adjacent columns. For instance, DCS would not be effective against an offset gradient that goes from one side of the column to the other. However, in a proper column ADC design, the main source of non-uniformities is process spread, which can usually be considered to have a Gaussian distribution. In such cases, the column FPN can be expected to decrease by a factor equal to the square root of n, where n is the number of columns being switched, since it is essentially Gaussian noise that is averaged. However, such a mathematical analysis does not account for the perceptual effects of DCS. In the next sub-section, such perceptual effects will be studied using simulations. # 4.4.2 Dynamic Column Switching Simulations As discussed in the previous sub-section, the complexity of the switching matrix is the main design variable in applying DCS. This switching matrix should be divided in unit switching cells that connect n imaging array columns to n column ADCs, where a larger n is expected to yield better results at the cost of a higher switching cell complexity. Using Matlab simulations, the perceptual effects of DCS for different values of n were evaluated. Results of such simulations are depicted in Figure 4-17 through 4-20. In all these figures, a Gaussian distributed column FPN with a $\sigma$ of 3% of full scale was added throughout the sample image, which has 520 x 388 pixels. While this is far too much column FPN to yield an acceptable image quality, also with DCS, it is very well suited for comparative purposes, as all effects become more visible. In every image shown, DCS was applied only to the left half of the image, enabling a direct comparison within one image. In Figure 4-17, the visual effects are shown for the simplest switching matrix, with n=2, which was already shown in [4.34]. Although such a switching matrix does decrease column FPN, it remains quite visible in the left half of the image, as only 2 adjacent columns are alternated to spread the column FPN. Much better results can be obtained with more complex switching cells that switch n=3 through 5 inputs to outputs, as depicted in Figure 4-18 through 4-20. As expected, DCS is more effective for more complex switching schemes, although the increase in effectiveness becomes progressively less for higher n. This corresponds well with the Gaussian noise model, which predicts that the decrease in column FPN should be proportional to the square root of n. Even for n=3, the simulation results show very acceptable results given the very high initial column FPN, with the exception of some wider residual stripes that are visible. The explanation for such stripes is as follows: since the column FPN is only spread over 3 columns, the chosen Figure 4-17: Sample image with 3% Gaussian column FPN. DCS is simulated on the left half of the image using a 2x2 unit switching cell Figure 4-18: Sample image with 3% Gaussian column FPN. DCS is simulated on the left half of the image using a 3x3 unit switching cell Figure 4-19: Sample image with 3% Gaussian column FPN. DCS is simulated on the left half of the image using a 4x4 unit switching cell Figure 4-20: Sample image with 3% Gaussian column FPN. DCS is simulated on the left half of the image using a 5x5 unit switching cell Figure 4-21: Block diagram illustrating an improved switching block, made by interleaving two 3x3 unit switching cells set of columns may sometimes have an average offset that differs significantly from the overall average of the image. While it is possible to decrease this residual effect by increasing n, this is unattractive as the complexity of the switching matrix rapidly increases. Instead, an alternative solution can be found. Two switching cells of n=3 are interleaved with one another, i.e. one circuit is connected to column k, k+2 and k+4 while the second is connected to column k+1, k+3, and k+5, as depicted in Figure 4-21. This results in more spatial spreading of residuals, since any 3 columns that are averaged by a unit switching cell are not adjacent to one another, but are interleaved with another set of 3 columns. This reduces the visibility of any residual column FPN, as shown in Figure 4-22. Here, again 3% column FPN is added throughout the image, while DCS is applied in the left half of the image using 3 x 3 unit switching cells that are interleaved. To further evaluate the perceptual effects for a lower amount of column FPN, simulations using the same interleaved 3 x 3 switching cell were performed for 2% and 1% column FPN, as depicted in Figure 4-23 and 4-24. As can be seen from the figures, residual column FPN is hardly Figure 4-22: Sample image with 3% Gaussian column FPN. DCS is simulated on the left half of the image using the improved switching cell of Figure 4-21 Figure 4-23: DCS simulation under the same parameters as Figure 4-22, but with 2% column FPN in the image Figure 4-24: DCS simulation with the same parameters as Figure 4-22, but with 1% column FPN in the image visible for an initial column FPN of 2%, while a column FPN of 1% is rendered invisible using DCS. In conclusion, the simulated images show very promising results, as DCS can lead to a very significant reduction of the visibility of column FPN, even with a very simple switching matrix. This means that the technique can considerably relax the required uniformity of the column circuits. However, a silicon implementation is needed to confirm the simulation results. Therefore, a prototype CMOS imager with a DCS implementation was designed for testing purposes, which will be described in detail in chapter 5. #### 4.5 References - [4.1] M.J. Loinaz, K.J. Singh, A.J. Blanksby, D.A. Inglis, K. Azadet, and B.D. Ackland, "A 200-mW, 3.3-V, CMOS color camera IC producing 352 x 288 24-b video at 30 frames/s", *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 2092-2103, December 1998 - [4.2] S. Smith et al., "A Single-Chip CMOS 306 x 244-Pixel NTSC Video Camera and a Descendant Coprocessor Device", *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 2104-2111, December 1998 - [4.3] S. Mendis, S. Kemeny and E.R. Fossum, "CMOS active pixel image sensor", *IEEE Transactions on Electron Devices*, vol. 41, pp. 452-453, March 1994 - [4.4] R.H. Nixon, S.E. Kemeny, B. Pain, C.O. Staller and E.R. Fossum, "256 x 256 CMOS active pixel sensor camera-on-a-chip", *IEEE Journal of Solid-State Circuits*, vol 31, no. 12, pp. 2046-2050, December 1996 - [4.5] E. Oba, K. Mabuchi, Y. Lida, N. Nakamura and H. Miura, "A 1/4 inch 330k square pixel progressive scan CMOS active pixel image sensor", *IEEE International Solid-State Circuits Conference*, vol. XL, pp. 180-181, February 1997 - [4.6] Z.Zhou, B. Pain, and E.R. Fossum, "CMOS active pixel sensor with on-chip successive approximation analog-to-digital converter", *IEEE Transactions on Electron Devices*, vol. 44, no. 10, pp. 1759-1763, October 1997 - [4.7] S. Decker, R.D. McGrath, K. Brehmer, and C.G. Sodini, "A 256 x 256 CMOS imaging array with wide dynamic range pixels and column-parallel digital output", *IEEE Journal of Solid-State Circuits*, vol 33, no. 12, pp. 2081-2091, December 1996 - [4.8] W. Yang, O-B. Kwon, J-I. Lee, G-T. Hwang and S-J. Lee, "An integrated 800 x 600 CMOS imaging system", *IEEE International Solid-State Circuits Conference*, vol. XLII, pp. 304-305, February 1999 - [4.9] U. Ramacher et al., "Single-chip video camera with multiple integrated functions", *IEEE International Solid-State Circuits Conference*, vol. XLII, pp. 306-307, February 1999 - [4.10] B. Mansoorian, H. Yee, S. Huang and E. Fossum, "A 250mW 60 frames/s 1280 x 720 pixel 9b CMOS digital image sensor", *IEEE International Solid-State Circuits Conference*, vol. XLII, pp. 312-313, February 1999 - [4.11] T. Sugiki et al., "A 60 mW 10b CMOS image sensor with column-to-column FPN reduction", *IEEE International Solid-State Circuits Conference*, vol. XLIII, pp. 108-109, February 2000 - [4.12] K. Findlater et al., "SXGA pinned photodiode CMOS image sensor in 0.35µm technology", *IEEE International Solid-State Circuits Conference*, vol. XLVI, pp. 218-219, February 2003 - [4.13] M. Mase, S. Kawahito, M. Sasaki, Y. Wakamori, and M. Furuta, "A wide dynamic range CMOS image sensor with multiple exposure-time signal outputs and 12-bit column-parallel cyclic A/D converters", *IEEE Journal of Solid-State circuits*, vol. 40, no. 12, pp. 2787-2795, December 2005 - [4.14] Y. Nitta et al., "High-Speed Digital Double Sampling with Analog CDS on Column Parallel ADC Architecture for Low-Noise Active Pixel Sensor", IEEE International Solid-State Circuits Conference, vol. XLIX, pp. 500-501, Feb. 2006 - [4.15] I. Takayanagi et al., "A 1¼ inch 8.3M pixel digital output CMOS APS for UDTV application," *IEEE International Solid-State Circuits Conference*, vol. XLVI, pp. 216 217, February 2003. - [4.16] L. Kozlowski et al., "A progressive 1920×1080 imaging system-on-chip for HDTV cameras," *IEEE International Solid-State Circuits Conference*, vol. XLVIII, pp. 358 359, February 2005. - [4.17] S. Yoshihara et al., "A 1/1.8-inch 6.4MPixel 60 frames/s CMOS image sensor with seamless mode change," *IEEE International Solid-State Circuits Conference*, vol. XLIX, pp. 492 493, February 2006. - [4.18] J. Yang et al., "A 3Mpixel low-noise flexible architecture CMOS image sensor," *IEEE International Solid-State Circuits Conference*, vol. XLIX, pp. 496 497, February 2006. - [4.19] K-B Cho et al., "A 1/2.5 inch 8.1Mpixel CMOS Image Sensor for Digital Cameras", *IEEE International Solid-State Circuits Conference*, vol. L, pp. 508-509, Feb. 2007 - [4.20] B. Fowler, A. El Gamal, and D.X.D. Yang, "A CMOS area image sensor with pixel-level A/D conversion", *IEEE International Solid-State Circuits Conference*, vol. XLI, pp. 226-227, Feb. 1994 - [4.21] D.X.D. Yang, B. Fowler, and A. El Gamal, "A Nyquist-Rate Pixel-Level ADC for CMOS Image Sensors", *IEEE Journal of Solid-State Circuits*, vol. 34, no. 3, pp. 348-356, March 1999 - [4.22] D.X.D Yang, A. El Gamal, B. Fowler, and H. Tian, "A 640 x 512 CMOS Image Sensor with Ultrawide Dynamic Range Floating-Point Pixel-Level ADC", *IEEE Journal of Solid-State Circuits*, vol. 34, no. 12, pp. 1821-1834, March 1999 - [4.23] S. Kleinfelder, S. Lim, X. Liu, and A. El Gamal, "A 10kframe/s 0.18µm CMOS Digital Pixel Sensor with Pixel-Level Memory", *IEEE International Solid-State Circuits Conference*, vol. XLIV, pp. 88-89, Feb. 2001 - [4.24] W. Bidermann et al., "A 0.18µm High Dynamic Range NTSC/PAL Imaging System-on-a-chip with Embedded DRAM Frame Buffer", *IEEE International Solid-State Circuits Conference*, vol. XLVI, pp. 212-213, Feb. 2003 - [4.25] M.F. Snoeij, P. Donegan, A.J.P. Theuwissen, K.A.A. Makinwa, and J.H. Huijsing, "A CMOS Image Sensor with a Column-Level Multiple-Ramp Single-Slope ADC", *IEEE International Solid-State Circuits Conference*, vol. L, pp. 506-507, Feb. 2007 - [4.26] L. Lindgren, "A New Simultaneous Multislope ADC Architecture for Array Implementations", *IEEE Trans. on Circuits and Systems II*, vol. 53, no. 9, pp. 921-925, September 2006 - [4.27] O-B. Kwon et al., "A Novel Double Slope Analog-to-Digital Converter for a High-Quality 640x480 CMOS Imaging System", *IEEE Int. Conference on VLSI and CAD*, pp. 335-338, Oct. 1999 - [4.28] B. Smith, "Instantaneous Companding of Quantized Signals", Bell System Technical Journal, vol. 36, pp. 653-709, May 1957 - [4.29] C. L. L. Dammann, D. McDaniel, and C.L. Maddox, "D2 Channel Bank Multiplexing and Coding", *Bell System Technical Journal*, vol. 51, pp. 1675-1700, October 1972 - [4.30] M.F. Snoeij, A.J.P. Theuwissen, and J.H. Huijsing, "A low-power Column-Parallel 12-bit ADC for CMOS Imagers", IEEE Workshop on CCDs and Advanced Image sensors 2005, pp. 169-172, Karuizawa, Japan, June 2005 - [4.31] T. Otaka et al., "12-Bit Column-Parallel ADC with Accelerated Ramp", *IEEE Workshop on CCDs and Advanced Image sensors* 2005, pp. 173-176, Karuizawa, Japan, June 2005 - [4.32] M. F. Snoeij, A. Theuwissen, K. Makinwa, and J. H. Huijsing, "A CMOS imager with column-level ADC using dynamic column FPN reduction," *IEEE International Solid-State Circuits Conference*, vol. XLIX, pp. 498 499, February 2006. - [4.33] M. F. Snoeij, A. J. P. Theuwissen, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS imager with column-level ADC using dynamic column fixed-pattern noise reduction," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 3007 3015, December 2006. - [4.34] T. Anaxagoras, S. Triger, N.M. Allinson, R. Turchetta, "APS Column Fixed Pattern Noise Reduction", *IEEE Workshop on CCDs and Advanced Image Sensors 2005*, pp. 76-79, June 2005 # A CMOS Imager with a Low-Power Column-Level Single-Slope ADC In this chapter, a CMOS imager with a low-power column-level single-slope ADC is presented. The prototype imager is implemented in a 0.18 $\mu$ m CMOS process, and has 340 column-level ADCs for read-out. The column ADC features an optimized low-power column comparator design that consumes only 3.2 $\mu$ W and requires only 19 transistors. Furthermore, it is the first imager that demonstrates the effectiveness of the Dynamic Column Switching (DCS) technique, which was introduced in section 4.4, to reduce the visibility of column non-uniformities. Section 5.1 provides a system-level overview of the sensor. This is followed by a detailed description of the column-level comparator in section 5.2, as it is the most performance-critical block of the ADC. In section 5.3, the column circuitry required to implement DCS is described. Finally, section 5.4 discusses measurement results on this sensor. #### 5.1 Sensor Overview #### 5.1.1 Design Goals The main focus of the imager design presented in this chapter is to reduce the power consumption of a column-level single-slope ADC as much as possible. As was shown in section 4.2.2, the key analog component of this architecture is the column-level comparator, as it both consumes most of the power and is the most performance-critical circuit block in a column-level single-slope ADC. In order to minimize power consumption of the comparator, a 'minimalist' approach was taken. The analog design focused only on meeting the primary requirements for the comparator, i.e. ensuring it has enough gain and speed, while using the minimum amount of circuitry. In order to meet other requirements, such as reducing comparator offset, system-level compensation techniques were used, as will be explained in this chapter. Since most of the extra circuitry associated with these compensations is digital, they require little power, and will shrink with succeeding CMOS generations. An important example of such a system-level technique is the Dynamic Column Switching (DCS) technique introduced in section 4.4. The imager presented in this chapter is the first to use DCS as a method of masking ADC offset. #### 5.1.2 System-Level Overview In Figure 5-1, a block diagram of the realized image sensor [5.1-5.3] is depicted, along with the essential board-level circuitry that is required to operate the sensor. The imager is implemented in a single-poly 4-metal 0.18 $\mu$ m CMOS process from Philips Semiconductors, and is based on an existing image sensor. Therefore, some circuit blocks, such as the column and row decoders, as well as the pixel array itself, were re-used from the existing design. For the prototype, standard 3T pixels with n-well photodiodes were used. The imager has a resolution of 680 x 512 pixels and has a pixel pitch of 5.6 $\mu$ m. The chip size is 5.4mm x 4.5mm. To facilitate the layout of the column ADC, the column pitch was designed to be twice the pixel pitch (11.2 $\mu$ m). In order to read out the pixel array, it was intended to place column ADCs both above and below the imaging array. Unfortunately, practical floor planning problems during the layout Figure 5-1: Block diagram of the realized sensor along with the required board-level circuitry of this prototype prevented the placement of column ADCs above the imaging array, which reduced the amount of pixels that can be read out to 340 x 512 pixels. The prototype uses a 1.8V supply voltage for the analog circuitry and a 2.8V supply for the pixel, digital and I/O circuitry A switching matrix is implemented at the column inputs to enable the use of DCS. As a result, any non-uniformities occurring after the switching matrix will be spread out by the DCS operation. The remainder of the column is partitioned into several blocks: the front-end biasing circuitry (biasing the in-pixel source followers), the column comparator, a digital memory, and a column decoder. As is usual in a column-level ADC, the front-end sample-and-hold operation is implemented together with the column comparator, and is therefore not displayed as a separate circuit block. Figure 5-2 shows a chip micrograph of the imager, and Table 5-1 summarizes the prototype specifications. Figure 5-2: Chip micrograph of the prototype imager Table 5-1. Prototype specifications | Technology | 1P4M 0.18μm CMOS | |------------------------------|------------------| | Die size | 5.4mm x 4.5mm | | Supply Voltage | 2.8V/1.8V | | Pixel Pitch | 5.6μm | | Pixel Type | 3T | | Fill Factor | 47% | | Number of pixels (on layout) | 680 x 512 | | Number of pixels (read out) | 340 x 512 | | Target frame rate | 30fps | | Column ADC pitch | 11.2μm | | ADC resolution | 10b | | ADC clock frequency | 20MHz | | ADC LSB voltage | 600μV | A number of circuit blocks were implemented off-chip to ensure flexibility in the measurements of this research prototype. As indicated in Figure 5-1, most of the digital control was implemented at circuit board level inside an FPGA. The ramp generator was also located off-chip, and was implemented by using a DAC controlled by the same FPGA. ### **5.1.3** Column ADC Requirements The target frame rate of the image sensor is 30 frames/second, as indicated in Table 5-1. Combined with the imager resolution of 512 rows, this results in a line time of $65\mu s$ . During this time, a row of pixels should be read out, and subsequently, an A/D conversion should be performed. With a target ADC resolution of 10b, there are 1023 clock periods required to perform an A/D conversion. Therefore, the ADC clock frequency is chosen to be 20MHz. As a result, there are 1300 clock periods in each line time; apart from the A/D conversion itself, 273 clock periods can be used for front-end readout and other overhead. The pixel output voltage swing is estimated at 600 mV, resulting in an LSB voltage of about $600 \mu V$ . The integral non-linearity (INL) of the ADC is not very critical, as long as it is lower than the expected non-linearity of the photodiodes, which, as detailed in section 3.1.2, is at least 1%. On the other hand, the differential non-linearity (DNL) of the ADC should remain below 0.5LSB to guarantee monotonicity. As mentioned in sub-section 5.1.1, the design goal is to decrease the power consumption of the ADC as much as possible. Therefore, as a benchmark, the goal is to reduce the power consumption of the column-level ADC below that of comparable chip-level ADCs. Since most chip-level ADCs use a pipeline architecture, the power consumption of the column-level ADC should be compared to the state-of-the-art for pipeline ADCs, which, for the process the design is realized in, is about 0.6mW/MSPS. Because of the required frame-rate of 30 frames/second, the combined throughput of the column-level ADCs should be about 6MSPS. Therefore, in order to have a lower power consumption than a comparable chip-level ADC, the total power consumption of the column-level ADCs should be less than 3.5mW, which is about 10μW per column. # 5.2 Column Comparator Design #### **5.2.1** Comparator Input Circuitry As explained in chapter 2, the output of the front-end read-out circuit consists of two voltages that are sampled onto capacitors. These voltages have to be subtracted from each other, and the result has to be converted into the digital domain. While a chip-level ADC architecture usually employs a separate amplifier to do the subtraction, in a single-slope column-level ADC the subtraction can be combined with the A/D conversion itself. For this design, an input configuration similar to [5.4] is used, as is depicted in Figure 5-3a. The front-end is read out by biasing the pixel via biasing source $I_h$ , and subsequently storing the signal and reset voltages from the pixel onto capacitors C1 and C2, as is shown in the timing diagram of Figure 5-3b. During this front-end sampling operation, the ramp-generator outputs a constant reference voltage to enable the storage of signals onto the sampling capacitors. Finally, as already discussed in the previous section, a switching matrix is implemented between the pixel circuit and the column-level ADC, to implement dynamic column switching. More details about this switching matrix will be discussed in section 5.3. Since the voltages sampled on capacitors CI and C2 have to be subtracted, it is the differential voltage across the capacitors that has to be converted into the digital domain. This is done as follows: the ramp generator supplies a differential ramp voltage, with opposite polarity to the differential voltage on capacitors CI and C2. As a result, the comparator will trigger when the ramp voltage equals the differential voltage across the capacitors, and a corresponding digital counter value can be stored. The above-described configuration eliminates the need for the separate subtracting amplifier as required a chip-level ADC architecture (chapter 2), thus reducing chip area and power consumption. # 5.2.2 Comparator Topology The main factor determining the topology of the comparator is the amount of gain that is needed. This can be derived from the LSB voltage of the ADC, which is $600\mu V$ in this design. Since the comparator should be able to distinguish between input signals that are a single LSB voltage Figure 5-3: a) Front-end readout circuit. During the A/D conversion the input signals stored onto C1 and C2 are subtracted. b) Corresponding timing diagram apart, it should provide enough gain to amplify the LSB voltage to digital output levels. Assuming that a digital gate connected to the output of the Figure 5-4: a) Block diagram of the analog column circuit b) Associated timing diagram comparator needs a few hundred millivolts of input voltage swing, the comparator gain should be at least 60dB. In order to realize such gain, there are two distinct classes of circuits available [5.5]. The first class of circuits consists of linear gain stages, as implemented inside operational amplifiers. Such circuits could for instance consist of a differential pair with a load, or a single common-source stage with load. The second class of circuits consists of regenerative latches, also called 'clocked' comparators or voltage sense amplifiers. These require a clocking input, which periodically resets the regenerative latch. After the reset, some positive-feedback circuit, for instance a pair of cross-coupled transistors, is activated, which slews to one of two possible states depending on the input signal. A large variety of such circuits exists in literature [5.6-5.8][5.14]. Compared to gain stages, regenerative latch circuits offer the advantage of a higher speed-to-power ratio. On the other hand, the presence of a clocking signal can possibly lead to cross talk problems with other analog circuits. In particular, regenerative latches are known to exhibit a so-called 'kick-back effect': at the clock edge, some charge can be kicked back into the input. This effect is a concern in relation to the input circuit presented in the previous section. As the comparator input consists of capacitors holding a signal charge, it is obvious that such a kick-back effect is likely to cause signal degradation. Moreover, a large number of comparators is interconnected via the common ramp generator output. If charge is injected into this common node, it can lead to severe column-to-column cross talk effects. Because of this potential for cross talk, it can be concluded that a linear input stage is preferred. The remainder of the comparator can be implemented either with gain stages or with regenerative latches; because of the better power efficiency, regenerative latches are chosen in this design. Since regenerative latches can have a very high effective gain, a single latch, combined with a gain stage at the input, should easily be able to provide the required gain of 60dB. This, combined with the input circuitry presented in the previous section, results in the analog column circuit as shown in Figure 5-4a. The function of capacitors *C3* and *C4*, as well as switches *S3* and *S4*, will be discussed in the next section. # 5.2.3 Offset and Delay Compensation As discussed in chapter 4, offset and delay of the column comparator are the main sources of non-uniformity in a column-level single-slope ADC. In CMOS image sensors, the absolute value of either offset or delay is not a concern, but the *variation* of offset or delay between columns on the same imager is a major problem. This is caused by the properties of the human visual system, which, on the one hand, is not very sensitive to the absolute amount of light in an image, but, on the other hand, is very sensitive to relative variations within an image. Therefore, column-to-column offset variations in an image will be highly visible to the human eye. Based on the perceptual effects of column FPN, the non-uniformities should be less than 0.1% of full scale [5.9]. In this design, this is equivalent to an offset variation of less than $600\mu V$ and a delay variation of less than one clock periods (50ns). While the application of DCS can somewhat relax these requirements, its effectiveness was not yet certain during the design phase, and therefore, the comparator had to be designed to comply with the stringent offset requirements. In the comparator design, it is important to realize the statistical consequences of the fact that several hundreds of comparators are integrated onto a chip. Even if the projected standard deviation of the offset is well below the required offset, a small number of comparators may still have an offset larger than required. Therefore, it is not sufficient to use the common practice of designing for a 3 $\sigma$ offset value, as this corresponds to 0.3% of the comparators having an offset of larger than this value. Instead, an offset of at least 4σ is required (corresponding to 0.006% of the comparators exceeding the specification). This means that the standard deviation of the offset should be less than 150µV. This can only be realized using dynamic offset cancellation techniques [5.10]. From the two well-known dynamic offset cancellation techniques, chopping is not feasible, because of the sampling capacitors at the input of the comparator. Any chopping at this node would lead to severe signal degradation, as switching charge would be injected into the capacitors. Therefore, some form of auto-zeroing should be used. There are two different ways to auto-zero the comparator. Firstly, an analog circuit-level auto-zero can be implemented, which is usually done by adding capacitors to the circuit onto which the offset is sampled and subsequently subtracted. However, it is very difficult to implement an analog auto-zero in a regenerative latch, as the offset is not readily available in the form of an analog voltage. A second method, usable in any ADC, is a system-level or digital auto-zero implementation. This is done by performing a second A/D conversion with a known input signal. The digitized output will consist of the known input signal plus the offset of the comparator, and therefore, the offset can be corrected in the digital domain [5.11]. There are two main drawbacks of this auto-zero method. Firstly, the second A/D conversion adds more quantization noise to the final digitized output, effectively reducing the resolution with half a bit. Secondly, such a system-level auto-zero can take a significant amount of time, since the time required to perform the second A/D conversion is proportional to the maximum expected offset. This can require a speed increase of the comparator in order to maintain the specified conversion speed of the A/D converter. In this design, the preamp stage has a circuit auto-zero, which is implemented with capacitors C3 and C4 and switches S3 and S4, as is depicted in Figure 5-4a. These realize the classical output-offset storage concept [5.12], as is further illustrated in the timing diagram in Figure 5-4b. Before the A/D conversion, the input of the preamp is shorted using switches S1 and S2 while switches S3 and S4 connect one side of capacitors C3 and C4 to a reference voltage $V_{cm}$ . As a result, the (amplified) preamp offset is present at the output of the preamp, where it is sampled onto capacitors C3 and C4 when switches S3 and S4 are disconnected. As a result the offset of the preamp is cancelled by the voltages stored in capacitors C3 and C4. Furthermore, the common-mode input voltage of the regenerative latch is set to $V_{cm}$ with this operation. Although the circuit auto-zero compensates for the offset of the preamp, the regenerative offset of the latch remains uncorrected. Based on extensive simulations, the regenerative latch offset is expected to be 40mV. This results in an input-referred offset of up to 2mV (at a minimum preamp gain of 20), which is still too high. Therefore, a second, system-level auto-zero is performed. This results in a hybrid offset compensation scheme: part of the offset is compensated at circuit level. while the residual offset is compensated with a system-level auto-zero. This approach mitigates the drawbacks of both the analog and system-level auto-zero. If only analog circuit auto-zero would be used, an extra gain stage would be needed, in order to have enough gain in front of the regenerative latch to reduce its input-referred offset. On the other hand, if only system-level auto-zero was to be used, it would take a long time, as the single-slope A/D conversion time is proportional to the expected offset voltage. The approach taken here allows most of the offset to be cancelled at the circuit level, without requiring extra analog power consumption; the small residual offset can then be measured by a quick A/ D conversion and cancelled at system-level. An added advantage of the system-level auto-zero is that it also compensates for any variations in delay that might otherwise cause column FPN. As a result, a relatively large delay of 12 clock periods (600ns) is tolerated in this design. This not only reduces the power consumption of the comparator, but also reduces the noise, because of the lower bandwidth of the circuit. However, a disadvantage of the system-level auto-zero is the added digital overhead: additional digital memory is needed in each column to store the result of the second A/D conversion. Furthermore, digital circuitry is required to subtract the result of the second A/D conversion from the first. However, the continuing downscaling of feature sizes in CMOS technology will reduce the required chip area and power consumption for these digital circuits, which validates this 'digitally assisted analog' approach. ### 5.2.4 Preamp Design The preamp is designed with a minimalist approach: it should provide some gain to reduce the input-referred offset of the regenerative latch, and it should act as a buffer to protect the input capacitors and ramp generator from the latches' charge kick-back. The minimum approach for a differential gain stage is a differential pair with a load. While this load could theoretically be implemented with resistors, these would have to be quite large to be able to operate the circuit at low power, and as a consequence, they would require too much chip area to be implemented inside a column. Therefore, the only practical solution is to use transistors as a load. These can either be connected as diode loads, or as current sources. While the simplest solution is to use diode loads, it severally limits the amount of voltage gain the stage can produce, as this gain is determined by the ratio of the transconductance of the input transistor to the transconductance of the load. As both operate at the same bias current, the gain is limited, again, by area considerations, to about 3-4x. An alternative is to use current sources as a load. This approach was used here, leading to the circuit depicted in Figure 5-5a. In this diagram, transistor M2 and M3 form the differential pair, biased with tail current source M1 and loaded with current sources M4 and M5. A drawback of the current source loads is that the common mode at the output is not well defined. Therefore, transistors M6 and M7 are added to regulate the common mode of the output [5.13]. These transistors operate in triode region, and sense the output common mode. As a result they effectively degenerate the current source loads, and this adjusts the common-mode to a reference voltage $V_{cm}$ which is input into the centrally implemented biasing circuitry of Figure 5-5b. Figure 5-5: a) Circuit diagram of the preamp b) Centrally implemented bias circuit As mentioned, the relatively slow speed of the comparator has a positive effect on the noise performance, as the circuit has a low noise bandwidth. Simulation results show a total amount of noise of $30\mu V$ rms over the (noise) bandwidth of the preamp of about 350kHz. Since the preamp has a voltage gain of about 30x, the input-referred noise of the regenerative latch can be neglected, and therefore, the total amount of noise at the ADC input will be dominated by kT/C noise of the sampling capacitors (C1 and C2 in Figure 5-4a), which should be $73\mu V$ . This is actually much better than is required for this ADC, as the LSB voltage is $600\mu V$ . Therefore, the power consumption in the preamp is determined by its required delay and not its noise. The preamp gain is determined by the transconductance of the input pair combined with the output resistance of both the input transistors and the current sources. As these parameters spread considerably over process, the gain can be expected to spread by 30 to 50%. This is not a problem in itself, since the gain of a comparator does not need to be accurately defined; however, the gain should stay between a lower and an upper limit. The lower limit is determined by the offset of the regenerative latch, which is estimated to be 40mV based on Monte-Carlo simulations. The preamp should provide enough gain to reduce this latch offset sufficiently when referred to the comparator input. For this design, a gain of 20x was considered minimum, as it leads to 2mV of offset at the comparator input. The upper gain limit is formed by the circuit auto-zero scheme introduced in the previous section. Because of the telescopic design of the preamp, the output voltage swing is limited to about 500mV, while based on matching models, the predicted input-referred offset of the preamp is about 9mV. Since the offset of the preamp is stored in capacitors at its outputs, the sampled offset is amplified by the preamp itself. Therefore, the preamp gain should be low enough to prevent the output from clipping during offset storage, which means that the gain should remain below 55x. Based on the lower and upper limit, a typical comparator gain of 33x was chosen by adjusting the output resistance of the stage. This output resistance is determined by the output resistance of the input transistors M2 and M3 and current sources M4 and M5. Since the latter have to have a high output resistance in order to ensure a good current matching, only the length of the input transistors was adjusted in order to realize the required gain. Simulations were performed to verify that the gain stays within limits over process and temperature corners. These corner simulations showed that the gain varies between 26x and 40x. The preamp operates at 1.8V and a tail bias current of only 500nA, thus consuming only 0.9μW. #### 5.2.5 Regenerative Latch Design There are many different regenerative latch circuits known in literature. In this design, the regenerative latch was based on a circuit published in [5.14], as it provides a good separation between analog and digital supply voltage. Such a separation is of great importance in a column-level ADC, as the multitude of column circuits might influence each other via current spikes in the supply voltage. The regenerative latch circuit is depicted in Figure 5-6. As can be seen in the figure, the latch consists of two stages, of which the first is connected to the analog supply *Vdda*, while the second is connected to the digital supply *Vddd*. The first stage is biased with transistor *M1* that operates as a current source. This Figure 5-6: Circuit diagram of the regenerative latch Figure 5-7: a) Simulation of the output of the regenerative latch b) analog and digital power consumption at the switching instant prevents the injection of current spikes in the analog supply. The input voltage is fed to differential input pair M2-M3. Transistors M4-M5 function as cascodes to the differential pair and reduce the amount of charge kicked back into the input nodes. Transistors M7-M8 are cross-connected to provide regenerative gain, and are reset on the clock signal using transistor M6. Since the current through the first stage is limited by the tail current source, the slew rate at the output of the stage is too low to directly drive a digital circuit. Therefore, a second stage is added that is connected to the digital power supply and is not current-limited. The output levels of this stage are compatible with standard CMOS logic circuits. Transistors M9-M10 of this stage mirror the current flowing through transistors M7-M8 in the first stage, and feed this signal to cross-coupled transistor pair M11-M12. Since this cross-coupled pair is not reset by a clock, it will permanently settle to one of the two possible states. Only if the differential input changes polarity, the current mirrored into transistors M9-M10 will pull the cross-coupled pair out of one stable state and into the other. To this end, the current mirror consisting of transistors M7-M10 has to be scaled properly: the quiescent current through the second stage should be kept as low as possible, while ensuring that transistor M9-M10 can drive enough current to change the state of the second stage. As a result, there will only be a peak in the digital power consumption when the output level changes, while the analog power consumption stays constant. This is illustrated in Figure 5-7, where a simulation result of the comparator output voltage is depicted along with the analog and digital power consumption. # 5.3 Dynamic Column Switching Circuitry The comparator design presented in the previous section includes an offset compensation scheme, which should provide adequate offset cancellation to prevent any visible column FPN. To provide an extra means of column FPN reduction, the dynamic column switching technique introduced in chapter 4 is added in this design to test the feasibility of this concept in silicon. Initially, the DCS scheme was conceived as a method to reduce the visibility of residual non-uniformities caused by the system-level auto-zero, as it leaves a residual non-uniformity of up to 1LSB. Based on promising simulation results, as shown in section 4.4.2, the DCS technique might also be usable to eliminate the need for a system-level offset compensation altogether. As discussed in section 4.4.2, simulation results show that two interleaved unit switching cells with 3 inputs and 3 outputs is the simplest switching scheme that yields acceptable results. This combination of two 3x3 unit cells effectively results in a 6x6 structure that is repeated throughout the columns to form a switching matrix. As indicated in Figure 5-1, the resulting switching matrix is inserted in front of the column ADC and front-end biasing circuitry. By doing so, all non-uniformities behind the switches are reduced by DCS. Moreover, the switching cells effectively become part of the sample-and-hold switches (S1 and S2 in Figure 5-4). As a result, any mismatch in their on-resistance should not cause any artefacts, provided that there is enough time for the in-pixel source follower to settle. This requirement can easily be met without large bias currents or large switches. In Figure 5-8, the unit switching cell that was used in the prototype is depicted. Since the switching cell has 3 inputs and outputs, there are 6 distinct ways to connect the inputs to the outputs. In each column, 5 Figure 5-8: 3 x 3 unit switching cell used to implement the DCS switching matrix transistors are required for the switch, and their gates are connected to control lines that are identical for all unit switching cells in the column. Three transistors are used to connect each input from the pixel array to one of the 3 intermediate nodes (n1, n2, n3). This is done by control lines sel1 through sel3, of which only one is enabled at any time. Thus, there are 3 different ways of connecting the inputs to the intermediate nodes. Furthermore, each column contains another 2 transistors that connect the intermediate nodes (n1, n2, n3) to the switch outputs, by means of control lines sel4 and sel5. Again, only one of these control lines is enabled at any time. Thus, there are 2 different ways of connecting the intermediate nodes to the outputs, and as a result there are 6 different ways of connecting inputs to outputs. As can be seen in Figure 5-8, the middle column contains two transistors marked with an asterisk (\*) that do not have an actual switching function, but are rather used as dummy switches to maintain layout uniformity. To ensure an acceptable on-resistance of the switches for all signal levels, control line voltages of 3.3V are used in combination with thick oxide transistors capable of handling 3.3V that have a W/L of $1.2\mu m/0.6\mu m$ . As described in section 4.4.2, two 3 x 3 unit switching cells are interleaved with one another. Using the unit switching cell described above, this can be easily accomplished by adding a second set of 3 intermediate nodes in each column. This second set of nodes interconnects the unit switching cell connected to columns n, n+2 and n+4, while the first set of nodes interconnects the unit switching cell connected to columns n+1, n+3 and n+5. The switch select lines *sel1* through *sel5* are controlled by a digital pseudo-random number generator that changes the state of the switches at the beginning of every line time. To prevent charge injection from the DCS switching matrix from degrading the imaging signals, this state change is performed before the pixels are connected to the column. For the prototype, the pseudo-random number generator was implemented as a 10b maximum-length linear-feedback shift register on an off-chip FPGA for flexibility that was part of the measurement setup. The same FPGA was used to restore the order of the digital output. #### 5.4 Measurements #### **5.4.1** Comparator Measurements In order to evaluate the performance of the comparator in detail, a separate measurement IC was made, containing only 4 comparators and some support circuitry. This enabled the performance of both the preamp and regenerative latch to be measured separately. Figure 5-9 depicts a chip micrograph of this measurement IC. Measurements with this IC showed an average gain of 46x, which is slightly outside the anticipated range of 26-40x. The standard deviation of the offset spread of the preamp was 2.2mV; however, this is based on measurements of only 8 comparators. In order to get offset measurements that are more statistically relevant, the imager IC itself can also be used to test the comparators. To this end, the prototype imager was equipped with a test input that can be used to directly input test voltages into the column ADCs. Each ADC therefore has exactly the same input voltage. Using this test input, a synthetic image Figure 5-9: Chip micrograph of the measurement IC used to evaluate the performance of the comparator Figure 5-10: image acquired using a test input voltage increasing each line time, without circuit auto-zero Figure 5-11: *Image acquired as in Figure 5-10, but with the application of a circuit auto-zero* Figure 5-12: Offset histograms of the column ADCs a) without circuit auto-zero b) with circuit auto-zero can easily be made by feeding a test input voltage that varies each line time. An example of such a test image is depicted in Figure 5-10. Since the test input voltage is slightly increased each line time, the image goes from black at the top to white at the bottom. Note here that although only half of the pixel array can be read out, each column is repeated once in this test image to preserve the normal width/height proportions, as will be done with each measured image in this section. As the circuit auto-zero is not applied in this test image, column FPN is clearly observable throughout image as vertical stripes. In Figure 5-11, a similar image is depicted, but here, a circuit auto-zero is applied. As can be seen from the figure, the circuit auto-zero reduces the column FPN. However, the column FPN reduction is less than expected. This becomes clear when quantative measurements of the offset variation are performed, which can easily be done by performing a similar measurement with a constant test input voltage. By averaging the columns in the resulting image, and comparing these averages, a measure for offset variation is acquired, which results in the histograms depicted in Figure 5-12. The histogram of Figure 5-12a depicts the offset distribution without the application of the circuit auto-zero. As can be expected, the histogram shows a Gaussian-like distribution. The standard deviation of this offset is 4.1DN (Digital Number), which corresponds to 2.5mV at the comparator input. This agrees reasonably well with the expected 4 $\sigma$ offset of 9mV. Figure 5-12b shows an offset variation histogram of a measurement where the analog circuit auto-zero is applied. The standard deviation is of this plot is 2.7DN, which is equivalent with 1.6mV at the comparator input. In theory, nearly all preamp offset should be removed by the auto-zero. leaving only the input-referred regenerative latch offset. If this were the case, the 4 $\sigma$ -offset of the regenerative latch (assuming a preamp gain of 45x) would be 289mV instead of 40mV that is expected based on simulation. Since such a large deviations from simulations is unlikely, it can be concluded that the circuit auto-zero is not functioning properly. This might well be caused by the fact that the preamp gain is higher than anticipated, resulting in the clipping of the preamp output during the auto-zero phase due to its own offset. In order to solve this problem, the preamp would have to be designed with a lower gain. However, in the current preamp circuit, the gain is determined by the length of the input transistors, which already have a minimum length. The only other way to reduce the gain is to reduce the output resistance of the current sources that load the output. However, this would increase the offset of the preamp, thereby exacerbating the problem of clipped preamp outputs during offset storage. Therefore, a better solution would be to use a different preamp circuit, for instance a differential pair loaded with transistors in triode region, as was done in [5.15]. Using the same test input, the ADC readout noise was measured, and was found to be about $150\mu V$ rms. While this noise is higher than anticipated, it is still sufficient for the 10b resolution of the ADC. Also, the INL can be measured by post-processing the image of Figure 5-10. As the line-by-line increase in input voltage can be assumed perfectly linear, any deviation from a linear increase in this test image is caused by non-linearity of the ADC. Therefore, an averaged INL plot can be made by computing the deviation of each column ADC response from a straight line, and averaging the results over all columns of the image, which is shown in Figure 5-13. As can be seen from the figure, the total non-linearity is about 4.5 LSBs, or 0.45%. Since the external ramp generator has at least a 12-bit linearity, the probable cause of INL is the existence of parasitic capacitance from the input to the output of the preamp, which adds a signal-dependent charge to the input sampling capacitor. However, this non-linearity is not a problem, as it is well below Figure 5-13: Averaged INL of the column ADC the non-linearity of the photodiodes itself, which is expected to be at least 1% (see sub-section 3.1.2). #### **5.4.2 DCS Measurements** Despite the problems with the circuit auto-zero, the realized prototype can very well be used to evaluate the proposed DCS technique. For such measurements, both circuit and system-level auto-zero are switched off. Furthermore, an image of a white piece of paper is captured to make the FPN as conspicuous as possible. As a result, the column FPN will be too high to yield an acceptable image quality, also when applying DCS, but it makes the perceptual effect of the technique well visible. Since the prototype uses 3T pixels, which results in a relatively high readout noise, all captured images shown in this section are averaged 20 times to reduce the readout noise. In Figure 5-14, a raw captured image is depicted, which is acquired without using DCS. Column FPN is clearly noticeable throughout the image. Apart from this, 4 horizontal bands are visible, which are caused by the fact that different pixel layouts were used in the imaging array. Figure 5-15 depicts an image taken using the same parameters, but this time with DCS. It clearly shows that dynamic column switching strongly Figure 5-14: Raw image captured without using DCS Figure 5-15: Raw image capture while using DCS Figure 5-16: a) Contrast enhanced image region without DCS. b) Average column output, showing a column FPN of 0.67% (std. dev.) Figure 5-17: a) Contrast enhanced image region with DCS. b) Average column output, showing a column FPN of 0.41% (std. dev.) reduces the visibility of column FPN, making it nearly invisible in this image. In order to quantify the observed column FPN reduction, the output of several rows (with the same pixel layout) was measured under uniform light conditions, as depicted in Figure 5-16a and 5-17a. To increase the visibility of the column FPN, the contrast in both images is enhanced 15 times. This contrast enhancement does reveal some residual column FPN in Figure 5-17a. In particular, a pattern of three lighter columns, interleaved with 3 darker columns, becomes visible. The explanation for this is simple: the switching matrix used for DCS is based on 3x3 unit switching cells that are interleaved with one another. Each 3x3 unit cell 'averages' the offset to the human eye, but these averages are not the same for each 3x3 unit cell. Therefore, if there is a large difference between two interleaved 3x3 switching cells, a pattern becomes visible. Using these images, graphs of the averaged column outputs were made (Figure 5-16b and 5-17b). The average initial column FPN is 0.67% (standard deviation); by using dynamic column switching, this is reduced to 0.41%. The initial peak FPN is 2.7%, and this is reduced to 1.1%. While the reduction in column FPN is not large enough to yield an image with acceptable quality, it is clear on the other hand that DCS offers a significant reduction in column FPN. This reduction relaxes the requirements on the column circuit, which can result in a reduction in required chip area, faster speed, or lower power consumption. For instance, in the prototype presented in this chapter, the residual column FPN after the circuit auto-zeroing of the preamp was expected to be 2mV maximum, or 0.2%. This residual offset could be further reduced using DCS, leading to column FPN levels that are acceptable for many applications. Therefore, the additional system-level auto-zero as proposed in sub-section 5.2.3 would not be needed, which increases the effective speed of the column ADC. Finally, an overview of the measurement results is given in Table 5-2. | Technology | 1P4M 0.18μm CMOS | | |--------------------------------------------|-------------------------|--| | Supply Voltage | 1.8V/3.3V | | | Number of pixels (on layout) | 680 x 512 | | | Number of pixels (read out) | 340 x 512 | | | Column ADC pitch | 11.2µm | | | ADC resolution | 10b | | | ADC temporal noise | 150μV | | | Column FPN (without DCS) | 0.67% of full-scale (σ) | | | Column FPN (with DCS) | 0.41% of full-scale (σ) | | | Comparator power consumption: | 3.2µW | | | Preamp DC gain | 46x | | | Comparator offset (no auto-zero) | 2.5mV (σ) | | | Comparator offset (with circuit auto-zero) | 1.6mV (σ) | | Table 5-2. *Prototype measurement results summary* ## 5.5 References - [5.1] M. F. Snoeij, A. Theuwissen, K. Makinwa, and J. H. Huijsing, "A CMOS imager with column-level ADC using dynamic column FPN reduction," IEEE International Solid-State Circuits Conference, vol. XLIX, pp. 498 - 499, February 2006. - [5.2] M. F. Snoeij, A. J. P. Theuwissen, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS imager with column-level ADC using dynamic column fixed-pattern noise reduction," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 3007 3015, December 2006. - [5.3] M.F. Snoeij, A.J.P. Theuwissen, and J.H. Huijsing, "A low-power Column-Parallel 12-bit ADC for CMOS Imagers", *IEEE Workshop on CCDs and Advanced Image sensors 2005*, pp. 169-172, Karuizawa, Japan, June 2005 - [5.4] K. Findlater et al., "SXGA pinned photodiode CMOS image sensor in 0.35μm technology", *IEEE International Solid-State Circuits Conference*, vol. XLVI, pp. 218-219, February 2003 - [5.5] D. A. Johns, and K. Martin, *Analog Integrated Circuit Design*, (chapter 7), New York: John Wiley & Sons Inc., 1997 - [5.6] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1148 1158, July 2004. - [5.7] K. J. Wong and C. K. Yang, "Offset compensation in comparators with minimum input-referred supply noise," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 837 - 840, May 2004. - [5.8] D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, B. Nauta, "A double-tail latch-type voltage sense amplifier with 18ps setup+hold time", *IEEE International Solid-State Circuits Conference*, vol L, pp. 314-315, February 2007 - [5.9] D. Sacket, "CMOS Pixel Device Physics", 2005 IEEE ISSCC Circuit Design Forum: Characterization of Solid-State Image Sensors, feb. 2005 - [5.10] C. Enz, and G. Temes, "Circuit techniques for reducing the effects of op-amp imperfections: autozeroing, correlated double sampling, and chopper stabilization", *Proceedings of the IEEE*, vol. 84, pp. 1584-1614, November 1996 - [5.11] W. Yang, O-B. Kwon, J-I. Lee, G-T. Hwang and S-J. Lee, "An integrated 800 x 600 CMOS imaging system", *IEEE International Solid-State Circuits Conference*, vol. XLII, pp. 304-305, February 1999 - [5.12] B. Razavi and B. A. Wooley, "Design techniques for high-speed, high-resolution comparators," *IEEE Journal of Solid-State Circuits*, vol. 27, pp. 1916 1926, December 1992. - [5.13] T. C. Choi, R. T. Kaneshiro, R. W. Brodersen, P. R. Gray, W. B. Jett, and M. Wilcox, "High-frequency CMOS switched-capacitor filters for communications application," *IEEE Journal of Solid-State Circuits*, vol. 18, pp. 652 664, December 1983 - [5.14] I. Mehr and D. Dalton, "A 500-MSample/s, 6-bit Nyquist-rate ADC for disk-drive read-channel applications", IEEE Journal of Solid-State Circuits, Vol. 34, No. 7, July 1999 [5.15] Q. Huang, C. Menolfi, "A 200nV offset 6.5nV/√*Hz* noise PSD 5.6kHz chopper instrumentation amplifier in 1μm digital CMOS", *IEEE International Solid-State Circuits Conference*, vol. XLIV, pp. 362-363, February 2001 | ^ | CMOS Imager with | a Law Da | war Calumn | I aval Sinal | a Clana ADC | |---|-----------------------|----------|--------------|---------------|-------------| | м | CIVIOS IIIIauei Willi | a LOW-PU | wei Coluilli | -Levei Siliui | e-Siobe ADC | # A CMOS Imager with a Multiple-Ramp Single-Slope ADC In this chapter, a prototype CMOS image sensor is presented that uses the column-level Multiple-Ramp Single-Slope (MRSS) architecture proposed in sub-section 4.2.3. It is the world's first imager to use such an ADC. As will be shown in this chapter, measurements on the realized prototype demonstrate the potential of the MRSS concept to lower power consumption and/or increased A/D conversion speed. The imager is implemented in a 0.25 $\mu$ m CMOS process and has a resolution of 400x330 pixels. In section 6.1, a system-level overview of the sensor is provided, followed by a description of the column-level circuitry in section 6.2. The multiple ramp generator is a critical part of the ADC design, which will be discussed in section 6.3. Finally, measurement results are given in section 6.4. ## 6.1 Sensor Overview ## 6.1.1 Design Goals In sub-section 4.2.3, a new column-level ADC architecture was proposed that can lead to a significantly lower power consumption and/or higher speed of the A/D converter. Like the classical single-slope ADC, this Multiple-Ramp Single-Slope (MRSS) architecture has the benefit of requiring only a comparator inside each column as the sole analog circuit. This simple column-level circuit reduces the required chip area and makes it relatively easy to ensure column-to-column uniformity. The potential advantage of the MRSS architecture is that it offers a higher A/D conversion speed for the same comparator speed. This speed advantage can subsequently be translated into reduced power consumption by using slower and lower power comparators. The main goal of the prototype imager presented in this chapter is therefore to prove that the MRSS architecture has a better power/speed ratio than the popular single-slope architecture, even though it still uses a simple column-level circuit. Another important technique for ADCs, which was introduced in section 4.3, is the use of an ADC with a companding quantization scheme. This can further reduce the power consumption of an ADC, by exploiting the presence of photon shot noise in imager signals. As was shown in sub-section 4.3.3, the MRSS ADC architecture is very suited to realize a companding quantization scheme. The imager presented in this chapter is the first to apply companding in an MRSS ADC, thereby showing a further potential for power reduction. The combination of the MRSS ADC and a companding quantization scheme will be referred to as Multiple-Ramp Multiple-Slope (MRMS), as the implementation requires the use of ramps with different slopes. To facilitate both the design of this prototype, and the comparison between single-slope and MRSS ADC architecture, the prototype is based on an existing imager made by DALSA Corporation that uses a column-level single-slope ADC. Several circuit blocks, such as the column comparator, are re-used from the existing imager. Moreover, the realized imager ADC presented in this chapter is designed such that it can operate in both in single-slope, MRSS, and MRMS mode. This allows for an easy comparison between the ADC architectures. ## **6.1.2** System-Level Overview In Figure 6-1, a block diagram of the sensor is depicted. The imager has a pixel array of 400 x 330 pixels. It uses standard 3T pixels with n-well photodiodes that have a pitch of 7.4µm. Unlike the imager presented in the previous chapter, the original DALSA column ADC design features a separate CDS amplifier at the input of each column Figure 6-1: *Block diagram of the realized sensor* circuit. In the original design, the gain of this amplifier was adjustable, which increases the effective dynamic range of the column-level ADC. However, to save control pins, the gain of the amplifier was fixed to 1x in the prototype described in this chapter. The remainder of the column contains the comparators, digital control and memory, and the column decoder. Finally, a central multiple-ramp generator provides the required ramp signals to the column. A large number of circuit blocks could be re-used from the original single-slope design without modification. For instance, the imaging array, row decoder and column-level CDS amplifier did not require any modifications. The column comparator itself was not modified; however, some extra analog switching and digital control logic was added around the comparator. This will be described in detail in section 6.2. The largest Figure 6-2: *Chip micrograph of the prototype imager* circuit block that had to be added was the multiple-ramp generator. This block needs to output several ramp voltages that are well matched to one another. For the MRSS mode, the ramps should have exactly the same slope, but with a well defined voltage offset with respect to one another. In addition, the MRMS mode requires not only ramps with a well defined voltage offset, but also, the ratio of the slopes should be exactly equal to an integer number. The design of the ramp generator will be discussed in detail in section 6.3. The imager is implemented in a 0.25µm single-poly triple-metal CMOS process from TSMC. The chip size is 5mm x 5mm. The prototype uses a 2.5V supply voltage for the analog and digital circuitry, and a 3.3V supply voltage for chip I/O and analog switches. As in the previous prototype, flexibility is ensured by implementing most of the digital control functions off-chip inside an FPGA. Table 6-1 summarizes the specifications of the prototype, and a micrograph of the die is depicted in Figure 6-2. | Technology | 1P3M 0.25μm CMOS | |---------------------|------------------| | Die size | 5mm x 5mm | | Supply voltage | 2.5V/3.3V | | Pixel pitch | 7.4µm | | Pixel type | 3T | | Number of pixels | 400 x 330 | | Column ADC pitch | 7.4µm | | Number of ADC ramps | 8 | | ADC resolution | 12b | | ADC clock frequency | 20MHz | | ADC input range | 1V | Table 6-1. *Prototype target specifications* ## **6.1.3** System-Level ADC Design Considerations The main ADC requirements are summarized in Table 6-1. The target resolution is 12bits and the ADC input range is 1V. The clock frequency for the system is 20MHz; this frequency is determined by the column comparators that are re-used from an existing column-level ADC design, as well as some practical board level restrictions. There is no specification for A/D conversion speed for this design; instead, the approach is to reduce the conversion time as much as possible in both single-slope and MRSS mode, and thus demonstrate the performance difference between the architectures. The main system-level design issue in the implementation of the MRSS ADC involves the choice of the number of ramps that is to be used. As illustrated by Eq. (4-3) in sub-section 4.2.3, for a 12 bit resolution, 64 ramps would be a theoretical optimum as it reduces the amount of column comparator decisions to a minimum. However, this does not take other important factors in consideration. The actual choice of the number of ramps revolves around two other trade-offs: Firstly, the expected speed/power ratio, secondly, the allowable chip area. The speed-to-power ratio of the MRSS ADC depends on several factors. Firstly, it is obvious that a higher number of ramps increase the speed of the ADC at the cost of a higher power consumption in the multiple ramp generator. This increase in power consumption is mainly caused by the output amplifiers that supply the ramp signals to the column comparators. Each of these amplifiers should be able to drive the maximum possible capacitive load that arises when all the comparators are connected to a single ramp. If the number of ramps is chosen too high, the power consumption of the amplifiers will outweigh the advantage of a faster A/D conversion. The exact trade-off between ramp buffer power and conversion speed is difficult to estimate based on system-level calculations; moreover, in this design, an existing column comparator was used that was not optimized for power (consuming $75\mu$ W/column instead of the power-optimized design of the previous chapter, which used only $3.2\mu$ W). In comparison, the initial power consumption estimates were about 1mW per ramp. As a result, the power consumption of the ramp generator would probably only become dominant for more than 32 ramps. Apart from the impact of ramp buffer power consumption on the speed-to-power ratio, another factor is the necessity to implement overlap between the ramps. This amount of overlap is not dependent on the number of ramps that are chosen, but on the expected noise and other errors during the coarse quantization phase, since such errors can lead to a comparator being connected to the wrong ramp in the fine phase. An overlap of 10mV both on the lower as well as the upper end of each ramp is a reasonable estimate to guarantee a dead-band free conversion. Moreover, column-level single-slope ADCs typically require a certain start-up period, to ensure that the comparator reaches a steady-state, and in particularly a steady delay, before the comparator can trigger on the lowest possible input voltage. This start-up period can account for another 10mV of input range, making the total overhead 30mV. If the number of ramps is chosen to be 8, each ramp should span 125mV, and thus the overhead is 24% of the total ramp voltage span. If 16 ramps were chosen, the overhead would be 48%. Therefore, the required ramp overhead significantly reduces the advantage of increasing the number of ramps to more than about 8. Finally, apart from speed/power ratio consideration, the amount of required chip area is another factor in the choice for the number of ramps. It is clear that a higher number of ramps requires more chip area and thus increases the cost of the imager. Apart from the increase in circuitry, the wiring of the ramp signals itself is of a particular concern. Each ramp signal is connected to all comparators. The capacitive load of the comparators combined with the resistance of the wiring forms a delay line. If this delay is too high, it will introduce an offset gradient over the columns. Based on the resistivity of the interconnect in the process used here, it was estimated that each ramp wire should be at least 4.2µm wide, in order to ensure that the total resistance of the ramp wire is low enough to ensure that there is no significant delay between the columns. For larger image arrays, this width would increase, and could therefore require a considerable chip area increase. Based on all issues mentioned, a number of 8 ramps was selected in this design. # **6.2** Column-Level Circuitry In Figure 6-3, a simplified block diagram of the column-level circuitry is depicted. An input amplifier reads out the pixel output voltages and performs the required DDS operation. This input amplifier is re-used from the DALSA column ADC design and was originally designed with an adjustable gain; however, for this design the gain was fixed to unity to simplify the digital control of the sensor. The column comparator is auto-zeroed using capacitor C1 and switch S2. During the auto-zero phase, the output of the column-level CDS amplifier is also sampled on Figure 6-3: Simplified block diagram of the column-level circuitry C1. Next, the comparator can be connected to a ramp voltage via S3. In the original single-slope design, a single ramp voltage was connected to switch S3. In this MRSS design, 8 ramp voltages *ramp1* through *ramp8* can be connected to the comparator via a 3-to-8 decoder. The output of the comparator is connected to a digital memory. While the figure depicts a single memory for clarity, two memory banks were implemented in each column, as is usual in a column-level ADC. This allows for simultaneous A/D conversion and digital readout of the column circuitry. As explained in sub-section 4.2.3, the MRSS architecture operates with a coarse and a fine A/D conversion phase. The coarse A/D conversion is performed by connecting each comparator to the a coarse ramp voltage and performing a normal single-slope A/D conversion. While a separate coarse ramp generator is theoretically required, in this design, the ramps are all generated with fully programmable DACs. Therefore, the coarse ramp voltage is generated by ramp1, which is connected to all column circuits. This is done by making the force ramp1 signal high, which feeds address 0 into the 3-to-8 decoder. The results of the coarse A/D conversion are stored in the column memory, and are subsequently used to connect each comparator to the correct ramp, i.e. the ramp in which range the input signal is in. This can be done by making the force ramp1 signal low. As a result, the output of the digital memory is connected to the 3-to-8 decoder, which connects the correct ramp voltage to the comparator. Since there are 8 ramp voltages in this design, 3 bits of digital memory are required to store the result of the coarse conversion. Next, the fine A/D conversion is performed, during which all 8 ramps are operated concurrently. The results of this conversion are also stored in the digital memory. While the fine A/D conversion theoretically only yields 9 bits of resolution, an extra bit is required to encode the overlap between the ramps that is required for robustness. As a result, 10bits of digital memory are used for the fine conversion phase. Some simple digital hardware is required to reconstruct a 12b integral digital code from the overlapping 3+8 bit raw digital output. This can easily be done in an off-chip FPGA in this design. As can be seen from Figure 6-3, compared to the classical single-slope ADC, the only additional column-level circuitry required to implement the MRSS architecture consists of 8 analog switches, a 3-to-8 decoder and 3 NOR gates. This underlines the advantage of the MRSS architecture, as it offers a significantly higher conversion speed for a simple column circuit. Finally, it is still easily possible to operate the column as a single-slope ADC for comparative purposes,. By making *force ramp1* high, a single slope conversion via *ramp1* can be performed. # 6.3 Multiple Ramp Generator Design ## **6.3.1** Ramp Generator Concept The most important requirement for the multiple ramp generator is that all ramp outputs are well-matched to one another. Firstly, their offsets should be well defined. Secondly, they should have the same slopes when the ADC is operated in MRSS mode, or slopes with an exact integer ratio when MRMS mode is used. Furthermore, for this research prototype, flexibility in the ramp voltage generation was also considered of importance. Therefore, the multiple ramp generator was implemented as a set of 8 matched DACs. This enables ramp outputs that are fully programmable, both to implement exact offsets between ramps as ramp slopes that have an exact integer ratio. These DACs should have a resolution of 12 bits and should be able to operate at the column ADC clock frequency of 20MHz. The DAC architecture used for the multiple ramp generator is based on a resistor ladder DAC first published in [6.2]. Figure 6-4a depicts a simplified block diagram of this ladder DAC. As can be seen in the figure, two resistor ladders are used. The DAC reference voltage $V_{ref}$ is first divided by a coarse resistor ladder. A fine resistor ladder is connected across one of the resistors of the coarse ladder via switches and two buffer amplifiers. A second set of switches connects one of the nodes of the fine resistor ladder to the output of the DAC. Although this DAC concept requires more resistors, and thus more chip area, than the well-known R-2R ladder network, it has the advantage of being monotonic by design. As already discussed in sub-section 4.2.1, monotonicity is of importance in imager ADCs. Another advantage of the coarse-fine ladder DAC is that it can easily be adapted to realize multiple matched DACs, which is illustrated in Figure 6-4b. As is shown in the figure, the approach is to connect eight fine resistor ladders to a single coarse resistor ladder. As a result, all ramp voltages are derived from a common reference, being the coarse resistors. This ensures a high degree of matching, since the voltage accuracy of each Figure 6-4: a) Conceptual diagram of the ladder DAC published in [6.2] b) Principle diagram of the multiple ramp generator used in this design output is mainly determined by the resistor matching in the *coarse* ladder. Therefore, even if there is mismatch between the resistors of the coarse ladder, this mismatch will be identical for all ramp outputs. While the original ladder DAC design of [6.2] used buffer amplifiers between the coarse and fine ladders, these were omitted for the multiple-ramp generator design for two reasons. Firstly, adding buffer amplifiers would greatly increase power consumption and chip area. Second, the original design employed bipolar amplifiers which were trimmed to an offset of less than $100\mu V$ . In the CMOS process used for this design, it is not possible to use offset trimming; instead, some form of dynamic offset cancellation would have to be used. This would further increase the power consumption and chip area. In order to achieve the required 12-bit resolution using a passive resistor structure as shown in Figure 6-4b, two effects must be taken into consideration. Firstly, the fine resistor ladder will load the coarse ladder nodes, which causes a voltage error. Secondly, the resistance of the Figure 6-5: Simplified circuit used to calculate the effect of the fine ladder loading the coarse ladder switches that connect the fine ladder to the coarse ladder will also cause a voltage error. Figure 6-5 shows a simplified circuit with a single coarse and fine ladder with some annotations that will be used to calculate the voltage error resulting from the direct connection of the fine resistor ladder to the coarse ladder. The coarse ladder consists of n resistors, each having a resistance $R_c$ . For simplicity, the reference voltage $V_{ref}$ across the coarse ladder is assumed to be 1V. The fine ladder, consisting of m resistors with a unit resistance of $R_f$ , is connected to the coarse ladder around coarse resistor i. Note that if the fine ladder would not load the coarse ladder, the voltage $V_A$ could be written as: $$V_A = \frac{i-1}{n} \tag{6-1}$$ and $V_B$ could be written as: $$V_B = \frac{i}{n} \tag{6-2}$$ However, the finite resistance of the fine ladder will cause an error on $V_A$ and $V_B$ . In order to keep this voltage error sufficiently low, the total resistance of the fine ladder should be a factor k higher than the unit resistance of the coarse ladder. Therefore, the relation between $R_c$ and $R_f$ can be expressed as follows: $$m \cdot R_f = k \cdot R_c \tag{6-3}$$ Using this relation, the voltages $V_A$ and $V_B$ can be expressed as follows: $$V_A = \frac{(i-1)}{(n-1) + \frac{k}{1+k}} \tag{6-4}$$ $$V_B = \frac{(i-1) + \frac{k}{1+k}}{(n-1) + \frac{k}{1+k}}$$ (6-5) By subtracting these voltages from the ideal voltages of Eq. (6-1) and (6-2), the voltage error on $V_A$ and $V_B$ can be expressed in k, i, and n. While it should be possible to analytically evaluate the resulting expressions, the resulting arithmetic is quite cumbersome. Moreover, the expression has to be evaluated for all i to evaluate the error at all nodes of the coarse ladder. Therefore, the expressions were numerically evaluated using Matlab, for some realistic values for k and n. It was found that in order to keep the voltage error below 0.5LSB for a 12-bit resolution, the unit resistance in the fine ladder should be at least twice as large as the unit in the coarse ladder (i.e. k/m=2). While a further increase beyond the minimum factor of two in the fine unit resistance has the benefit of further reducing the voltage error due to loading, there are obviously limits to such an increase. The most stringent requirement is the noise of the ladder. The resistor noise produced by the ladder is bandwidth limited by the buffer amplifier that will be connected to the output of the fine ladder. Based on an estimated noise bandwidth of 31MHz for the buffer, a maximum equivalent noise density for the ladder resistance can be calculated. Since the noise of the coarse and fine ladder is added, a maximum total resistance for both ladders can be derived, which was found to be $60k\Omega$ for this design. This limitation posed by noise considerations is also of great importance to solve the other problem of the passive connection of fine to coarse ladder, being the resistance of the switches. This error is worst for the output voltage taken from the edge of the fine ladder, where the switch resistance adds directly to the first unit resistance. Therefore, the resistance of the switch should be less than half of the unit resistance of the fine ladder, in order to keep the error below 0.5LSB. While it is possible to reduce the switch resistance sufficiently by increasing the size of the (nmos) switch, the increased switch size also increases the amount of charge injection. Initial simulation results showed that scaling the switches to a low on-resistance would lead to too much charge injection at the required speed of 20MHz. In this design, a different solution is used. Instead of trying to reduce the switch resistance to negligible levels, the switching transistor itself is used as one of the unit resistors of the fine ladder. Therefore, to ensure matching, the fine ladder should be made entirely of MOS transistors. This results in the ladder circuit depicted in Figure 6-6. The fine ladder consists of transistors $T_I$ through $T_{32}$ . Since transistors $T_I$ and $T_{32}$ are used both as resistors as switches, an extra set of switches is necessary to directly output the voltages present at the resistor nodes of the coarse ladder. Therefore, the switches between the coarse and fine ladder are divided into a force and a sense bus as depicted. Based on all mentioned design considerations, as well as preliminary chip area estimations, a coarse ladder of 128 resistors was chosen, along with 8 fine resistor ladders of 32 resistors. This division reduces the size of the fine ladder, which is preferable as it is implemented 8 times. Furthermore, the smaller number of resistors in the fine ladder allows each unit resistance to be higher here, while still complying with the noise requirements. Finally, the larger number of resistors in the coarse ladder reduces the voltage drop across the fine ladder. Since the resistance of the Figure 6-6: Simplified circuit diagram of the multiple ramp generator. nmos transistors that constitute the fine ladder is voltage dependent, this is of importance to ensure sufficient linearity of the multiple ramp generator. ## 6.3.2 Resistor Ladder Switching Logic While the logic circuitry that controls the switches in the ladder DAC might seem entirely straightforward, one refinement is possible that considerably reduces the amount of charge injected by the switches between coarse and fine ladder. This is illustrated in Figure 6-7. If a conventional logic decoder is used to drive the switches between the coarse and fine ladder, connecting the fine ladder to an adjacent coarse resistor involves re-connecting both ends of the fine ladder, as illustrated in Figure 6-7a. However, an alternative scheme is proposed in [6.2], which is depicted in Figure 6-7b. To connect the fine ladder to an adjacent coarse resistor, only one of the ladder's ends is disconnected, and then re-connected as shown, thus 'folding over' the fine ladder. In [6.2], this was done to ensure monotonicity of the DAC output even in presence of offset between the coarse and fine ladder because of the buffer amplifier used in that design. While such a buffer is not used in the design presented here, it is still advantageous to use this 'folding' scheme, as it reduces the amount of switching required, and thus reduces the amount of charge injection. To implement the folding decoder scheme, some extra logic is required. In [6.2], a decoding scheme using series-connected MOS Figure 6-7: a) Conventional decoder logic switching scheme b) Alternative scheme as proposed in [6.2] switches was employed. However, this is not possible in this design, since the fine ladder is implemented with nmos transistors that need to match the (force) switches. Therefore, the decoding is fully realized with standard CMOS logic, as shown in Figure 6-8. For clarity, only a single transistor per coarse resistor tap is depicted in the figure, instead of two transistors required in reality to implement the force and sense bus. These are connected to a single output bus *cse\_out1* and *cse\_out2*, to which the fine ladder is connected. As mentioned in the previous section, the coarse ladder contains 128 resistors. Figure 6-8 depicts a 4-resistor section of the coarse ladder, which is thus repeated 32 times to realize the full ladder. The ladder section is activated with the *dec\_out(x)* node. This node is connected to an ordinary binary 5-to-32 decoder (not shown in the figure), which decodes the higher 5 bits out of the 7-bit digital input of the ladder. The lower two bits, *dig\_in[1]* and *dig\_in[0]*, are connected to 4 digital gates depicted at the bottom of the figure. The outputs of these gates are Figure 6-8: Circuit diagram of the decoder logic used for the switches connecting coarse and fine ladder connected to NOR-gates in the ladder section itself that drive the actual switching transistors. As a result, the left part of the ladder section is an ordinary binary decoder, which only decodes the upper 6 bits of the digital input. Therefore, the left switches are each active for two successive digital inputs, as marked in the figure ('00/01' and '10/11' respectively). The right side of the ladder section contains three instead of two transistors to implement the folding scheme. The outer two switches are connected to the outside node of the 4-resistor ladder section, and are only active for a single digital input word ('00' and '11'). However, the adjacent 4-resistor sections will also have transistors connected to these outer nodes, and thus each resistor node will be connected to the fine ladder for two different digital input codes. This implements the required folding scheme. ## 6.3.3 Output Amplifier Offset Auto-Calibration As depicted in Figure 6-6, the ramp generator outputs need buffer amplifiers to drive the column comparators. These buffers should be able to drive a high capacitive load formed by the large number of comparator inputs. In this design, ordinary folded-cascode opamps with PMOS input transistors are used for this purpose. They are used as non-inverting unity gain buffers, and consume about 1mW each. While the multiple-ramp generator outputs themselves match very well due to the common resistor ladder approach, the output amplifiers can add offset to these ramp generator outputs, which can be expected to be several mV in this design. If such offsets would not be compensated for, they would directly be translated into 'jumps' in the ADC's transfer function from one ramp to another, leading to poor linearity. Therefore, the amplifier offset should be reduced to less than 1LSB, or $250\mu V_{\rm s}$ , in order to yield ramp output signals that are sufficiently matched. Such an offset compensation could be realized using a dynamic offset cancellation technique in the output amplifier. However, this might significantly increase power consumption and chip area. Therefore, in this design, a digital auto-calibration algorithm was used to reduce the amplifier offset. It is important to note that, like column-level comparator offset and delay, the absolute offset of the output amplifiers is not of importance, just their offset *variation*, since the human visual system is mainly sensitive to relative light variations within an image. The auto-calibration algorithm presented here removes the offset variation by comparing all ramp voltages to the first ramp voltage and subsequently adjusting the other seven ramp signals accordingly. For this comparison, one of the column-level ADCs is disconnected from the input and is instead used to perform the comparison between ramp1 and the other ramp, which is illustrated in Figure 6-9a. In this test column circuit, the comparator is disconnected from the CDS amplifier, and is instead connected to ramp1 via switch S1. Apart from this test column, the only addition to the analog circuitry required for the auto-calibration algorithm is that ramp1 should output one of seven test voltages $V_{m2}$ through $V_{m8}$ , as depicted in Figure 6-9b. These test voltages correspond to the middle of each of the ramp voltages ramp2 through ramp8. Since the ramp generators are implemented as fully programmable DACs, such a test voltage can easily be added. The auto-calibration algorithm operates as follows. While the other column comparators sample the output of the CDS amplifier onto capacitor C1 via switch S1, the depicted test comparator samples a test voltage $V_{mi}$ , where i can be between 2 and 8, as depicted in Figure 6-9b. After this sampling operation, the test column performs a normal A/D conversion along with the other column circuits. Since its input $V_{mi}$ corresponds to the middle of ramp i, it is certain that the test comparator will select ramp i during the coarse A/D conversion phase. Therefore, the result of the subsequent fine conversion phase is in fact a comparison between test input voltage $V_{mi}$ generated with ramp1, and rampi. If rampi does not have mismatch compared to ramp1, the digital output of the test column will correspond exactly to the middle of the rampi. Therefore, if there is offset between ramp1 and rampi, the difference between the digital output of the test column and the middle of rampi will be a measure for the offset of rampi. In order to calibrate all ramps, the above-described procedure has to be repeated for each of the ramp voltages (except *ramp1*). Moreover, since there is noise present on the digital output signal of the test column, the offset measurement of each ramp is repeated 32 times and is subsequently averaged in the digital domain. As a result, a full ramp generator auto-calibration takes 224 A/D conversions, or 224 line times. Since the implemented imager has 330 rows, a single ramp offset measurement is performed each frame time. The results of such a measurement can easily be used to correct the offset in the digital domain, by changing the digital codes assigned to the initial voltage of each of the ramps. In the prototype, this digital processing is done in an FPGA off-chip to allow for flexibility. Figure 6-9: a) Simplified circuit diagram of the test column b) Timing diagram of ramp1 including the test signals required for the auto-calibration algorithm ### **6.4** Measurement Results ## **6.4.1** Single-Slope Mode Measurements In order to test whether the prototype is functional, the first measurements were performed with the ADC in single-slope mode, since the digital control is the simplest in this mode. The line timing for the ADC is controlled via an off-chip FPGA and is therefore flexible. In Figure 6-10, a line timing diagram of the ADC in single-slope mode is depicted. In the column-level ADC, two operations are performed concurrently during each line time. Firstly, the column amplifiers sample the output voltages of a row of pixels, and perform an A/D conversion on the result of this readout. At the same time, the results of the A/D conversion performed during the previous line time are read out from the digital column memory. The timing of the analog column circuitry is depicted in Figure 6-10a. During the first 2µs of each line time, the column amplifiers sample the pixel voltages of a row of pixels. Next, the column comparator samples the amplifier output while simultaneously performing an auto-zero. While it would be possible to perform this operation immediately after the front-end sampling, initial measurement results showed that some settling time is required for the column amplifier. As a result, the total time required for front-end sampling and comparator sampling is 5µs. Next, a 10-bit single-slope operation is performed. This requires 1023 clock periods plus some overhead to account for comparator delay. At the system clock frequency of 20MHz, the A/D conversion therefore takes 51.9µs. Finally, an additional 1.4µs of overhead is required for digital Figure 6-10: Line timing diagram of the ADC in single-slope mode: a) analog column circuitry timing b) digital column memory readout timing control. The resulting total line time of 58.3µs, combined with the fact that the imager has 330 rows, allows for a maximum frame rate of 50 frames/second. Concurrently with the analog column circuit operation, the digital column memory is read out during each line time, as indicated in Figure 6-10b. Since 400 columns have to be read out at a system clock speed of 20MHz, the digital readout takes 20µs. As will be shown in the next sub-section, the time required for memory readout becomes an important constraint for ADC operation in MRSS and MRMS mode. In order to be able to test the ADC itself, the prototype is equipped with a separate test input with which a voltage can be directly fed into all column inputs. By applying a voltage to this test input that increases each line time, a synthetic test image was acquired, which is depicted in Figure 6-11. By comparing this figure to the image of Figure 5-11, it can easily be seen that the column ADC features a better FPN performance than the imager presented in chapter 5., the INL of the imager can be measured in similar fashion as was done fore Figure 5-13, by fitting a straight line to each column output in the image of Figure 6-11 and averaging the results over all columns. The resulting averaged INL graph Figure 6-11: Synthetic test image acquired with the ADC in single-slope mode Figure 6-12: Averaged INL measurements in single-slope mode at a clock frequency of 1MHz is plotted in Figure 6-12. Since it was not possible due to practical problems to do a temporal averaging of the column ADC output to decrease dynamic errors, this INL measurement was performed at 1MHz to reduce such errors as much as possible. As can be seen from the figure, the INL is within $\pm$ 0.5LSB. Similar measurements with a constant input voltage enable measurements of temporal noise and column FPN of the ADC. These show a random FPN of about 0.13% of full scale and a low-frequency gradient of about 0.5% from the left to the right column. This is well within the expected values, as the column circuits were originally designed as part of an imager for machine vision applications, where FPN is less critical. The low-frequency gradient is probably caused by supply voltage variations between the column circuits, since the current consumption of the comparators causes a voltage drop in the power wiring. Using the same test images, it is possible to measure the random noise of the column comparators, which was measured to be 1.4mV rms. This is somewhat disappointing, as it limits the resolution of the ADC to 10 bits, while the target resolution was 12 bits. Using the ADC in single-slope mode with the timing as described above, a 10-bit image was captured at 50 frames/second. This is depicted in Figure 6-15, on page 168, along with an image captured in MRSS mode for direct comparison, as will be described in the next sub-section. The power consumption of the prototype was measured to be 38mW for the analog circuitry, of which 8mW was used for output buffers of the ramp generator. Since only one out of the 8 ramps is used in single-slope mode, the effective analog power consumption is 31mW. The digital power consumption is 5mW, and the digital I/O circuitry consumes 9mW. #### **6.4.2 MRSS Mode Measurements** In Figure 6-13, the timing diagram of the column circuitry in 10-bit MRSS mode is depicted. It is divided into the timing of the analog column circuitry (Figure 6-13a) and column memory readout (Figure 6-13b), like the timing diagram of the single-slope mode (Figure 6-10). The operation of the analog column circuitry starts with the sampling of the front end, followed by comparator sampling and auto-zero. Next, the coarse A/D conversion is performed. While this conversion should theoretically only take 7 clock cycles, 76 clock cycles or 3.8µs are needed in the prototype. The main reason for this longer time is the fact that one of the ramp voltage has to output a step-wise ramp voltage spanning the entire input range of the ADC, as depicted in Figure 6-9b. Due to limitations in the slew rate of the ramp generator, the settling of the ramp generator at each voltage level requires 10 clock periods. After the coarse A/D conversion, each analog column circuit is connected to the correct ramp voltage. It was originally intended to start the fine A/D phase immediately after the comparators are connected to the correct ramp. However, initial measurements showed that the switching of the ramps at the comparator input unfortunately causes a large distortion on the sampled input signal. The probable cause for this distortion is capacitive cross talk from the switches to the sampling capacitors. To solve this problem, the column comparators re-sample the input signal that is still output by the column amplifiers, as indicated in the timing diagram. Figure 6-13: Line timing diagram of the ADC in MRSS mode: a) analog column circuitry timing b) digital column memory readout timing The fine A/D conversion phase takes at least 128 clock cycles or 6.4μs. On top of this theoretical minimum, some extra time for overlap of the ramp voltages is needed, in order to provide robustness against errors in the coarse A/D conversion. In sub-section 6.1.3, it was already estimated that an overlap of 30mV is required. During initial measurements, it was found that a larger overlap was needed. This is partly because of the perceptual effect of conversion errors: even if only a few pixels in the array are not correctly processed by the ADC, it is very visible to the human eye. Therefore, the overlap was extended to 72mV. This results in a total fine A/D conversion time of 10.2μs, or 204 clock periods. The total A/D conversion time is now 15.5μs, compared to 51.9μs in single-slope mode, which is an improvement of 3.3x. Figure 6-13b illustrates the timing of the digital column memory readout. While the analog column circuitry operates much faster in MRSS mode, it still takes 400 clock periods or $20\mu s$ to read out the column memory. As a result, the memory readout now takes nearly the entire line time, which shows a practical limitation in this prototype. While it might be possible to further optimize the A/D conversion time, for instance by reducing the time required for the coarse A/D conversion, this would not Figure 6-14: Averaged INL measurements in MRSS mode at a clock frequency of 1MHz: a) without auto-calibration b) with Figure 6-15: Captured image with the ADC in single-slope mode, at 50 frames/second Figure 6-16: Captured image with the ADC in MRSS mode, at 142 frames/second lead to a higher frame rate of the sensor. The total line time in MRSS mode is 20.5µs, which allows for a frame-rate of 142 frames/second. One of the important design problems of the MRSS ADC is to prevent the occurrence of any discontinuities in the ADC transfer function due to mismatch between the ramps. In order to test whether the matching of the multiple ramp generator is sufficient, INL measurements were performed using the same method as the INL measurements in single-slope mode (Figure 6-12), of which the results are depicted in Figure 6-14. In Figure 6-14a, the results of an INL measurement are shown where the auto-calibration algorithm described in sub-section 6.3.3 is not used. As can be seen in the figure, some clear discontinuities exist, which is due to offsets of the ramp generator's output buffers. The magnitude of these jumps corresponds well to the separate measurements of the buffer opamps, where a maximum offset variation of about 2.5mV, or 2.5LSBs, was found. Figure 6-14b shows an INL measurement where the auto-calibration algorithm is applied. It is clear from this measurement that the algorithm is effective in reducing static errors to less than 0.5LSB. In Figure 6-16, an image is depicted that was captured in MRSS mode at 142frames/second. Apart from the different ADC mode, the parameters used, such as aperture of the lens, captured scene, and integration time were kept exactly equal to those used to acquire the image in single-slope mode depicted in Figure 6-15. As can be seen in the figures, the ADC operation in MRSS mode does not introduce any artefacts into the image, while achieving a 2.8x higher frame rate. Apart from more complex circuitry, the only additional power required in MRSS mode is that consumed by the output buffers of the multiple ramp generator. These consume an extra 7mW, which is 24% of the analog power consumption, or 16% of the total power consumption of the prototype. #### **6.4.3 MRMS Mode Measurements** In order to operate the ADC in MRMS mode, a companding quantization scheme, such as the one shown in Table 4-3, needs to be made that is optimized for the sensor's characteristics. For this prototype, the maximum resolution of 10 bits somewhat limits the applicability of a companding quantization scheme, as the initial quantization noise will be relatively large compared to the photon shot noise. Moreover, the prototype uses 3T pixels that have a relatively lower amount of photon shot noise, since their saturation charge is higher (see Figure 4-11, on page 97). Based on the pixel layout, it was estimated that the pixel saturation charge is 60,000 electrons. This parameter, together with the ADC resolution of 10 bits, was used in the calculation method of appendix A to find a suitable companding quantization scheme. Since there are no stringent requirements for the quality factor $n_{margin}$ , it was chosen such that the resulting number of quantization steps is half of normal uniform quantization, in order to demonstrate the advantage of companding. This results in the theoretically quantization scheme of Table 6-2, where $n_{margin} = 0.34$ : Table 6-2. Calculated number of quantization steps with $n_{margin}=0.34$ | steps (LSBs): | 1 | 2 | 4 | total: | |---------------|-----|-----|----|--------| | binary step | 169 | 254 | 87 | 510 | As can be seen in the table, a scheme with a binary quantization increasing quantization step is used. This simplifies the digital reconstruction of the ADC output to a linear code; moreover, for the given resolution, the difference with an integer quantization step increase is small. The theoretically calculated scheme of Table 6-2 needs to be mapped to the prototype ADC. Therefore, for each of the 8 ramps, a slope has to be chosen. Furthermore, the ramps should have an equal length. Since the total number of quantization steps is halved compared to linear quantization, the length of the ramps (excluding overlap) is reduced from 128 to 64 clock periods. Based on this length, the slopes of the ramps are chosen to match the theoretical scheme of Table 6-2 as close as possible. The results of this mapping are given in Table 6-3. Table 6-3. Companding quantization scheme implemented in the prototype | ramp no: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | total: | |-----------------|----|----|-----|-----|-----|-----|-----|-----|--------| | Slope | 1x | 1x | 2x | 2x | 2x | 2x | 2x | 4x | | | number of LSBs: | 64 | 64 | 128 | 128 | 128 | 128 | 128 | 256 | 1024 | As can be seen from the table, some adjustments compared to the theoretical calculation had to be made for the mapping. Only the input range corresponding to the lowest 128 LSBs is converted at the full Figure 6-17: Line timing diagram of the ADC in MRMS mode: a) analog column circuitry timing b) digital column memory readout timing resolution, instead of the theoretical 169. On the other hand, the 2x slope segment is slightly larger than is theoretically required. Based on the companding quantization scheme of Table 6-3, the off-chip digital control for the prototype was re-programmed to operate the ADC in MRMS mode. Since the ramp voltages span different voltage ranges in MRMS mode, the ramp voltage output by *ramp1* during the coarse A/D conversion had to be changed. Furthermore, the auto-calibration algorithm had to be adjusted for the different voltage ranges. The timing diagram of the MRMS mode is depicted in Figure 6-17. It is nearly identical to the timing diagram of the MRSS mode, with the exception of the fine A/D conversion phase, which is 3.2µs or 21% shorter than in MRSS mode. However, this shorter A/D conversion time cannot be used to increase the frame rate of the imager, as it is limited by the time required for readout of the column memory (Figure 6-17b). As was the case in MRSS mode, an important question is whether the ramp generator has sufficient matching to prevent discontinuities in the ADC transfer function. To answer this question, INL measurements similar to the those in Figure 6-12 and Figure 6-14 were performed, which are depicted in Figure 6-18. In Figure 6-18a, an INL measurement result is depicted that is acquired without the application of auto-calibration. The plotted graph bears great resemblance to the similar INL measurement in MRSS mode that is depicted in Figure 6-14a, and shows the effect of offset in the ramp generator output buffers. Figure 6-18b depicts the result of an INL measurement where ramp auto-calibration is applied. The result shows that the auto-calibration is effective in eliminating ramp offsets. Moreover, it is clear that operation in MRMS mode does not introduce additional non-linearity. It can therefore be concluded that the multiple ramp generator features a good matching, even when the ramps are operated at different slopes. In Figure 6-20, an image is depicted that is captured in MRMS mode at 142 frames/second. As a reference, Figure 6-19 depicts an image in MRSS mode that was captured using the same parameters. As is clear from the figures, the operation in MRMS mode does not introduce any visible artefacts. Compared to the MRSS mode, it shortens the A/D conversion time by 3.2 $\mu$ s or 20%. While this is not a very significant advantage, the measurement results prove the feasibility of application of companding in the MRSS ADC architecture. Based on the companding quantization schemes detailed in appendix A, a much larger reduction in conversion time would be achievable with a higher resolution MRSS ADC, even if a better quality factor is required than $n_{margin}$ =0.34 that was used here. Finally, the main measurement results presented in this section are summarized in Table 6-4. Figure 6-18: Averaged INL measurements in MRMS mode at a clock frequency of IMHz: a) without auto-calibration b) with auto-calibration Figure 6-19: Captured image with the ADC in MRSS mode at 142 frames/second, as a reference for Figure 6-20 Figure 6-20: Captured image with the ADC in MRMS mode at 142 frames/second Table 6-4. Prototype Measurements | | Single-slope | MRSS | MRMS | | | |---------------------------|-------------------------|---------|---------|--|--| | Technology | 1P3M 0.25μm CMOS | | | | | | Supply Voltage | 2.5V/3.3V | | | | | | Number of pixels | 400 x 330 | | | | | | Column ADC pitch | 7.4µm | | | | | | ADC resolution | 10b | | | | | | ADC temporal noise | 1.4mV rms | | | | | | Random column FPN | 0.13% of full-scale (σ) | | | | | | Total power: | 45mW 52mW | | 52mW | | | | Analog power | 31mW | 38mW | 38mW | | | | Digital & I/O power | 14mW | | | | | | A/D conversion time | 51.9µs | 15.5μs | 12.3μs | | | | Min. line time | 58.3µs | 20.5μs | 20.5μs | | | | Max. frame rate | 50fps | 142fps | 142fps | | | | Max. effective pixelfreq. | 6.8MHz | 19.5MHz | 19.5MHz | | | #### 6.5 References - [6.1] M.F. Snoeij, P. Donegan, A.J.P. Theuwissen, K.A.A. Makinwa, and J.H. Huijsing, "A CMOS Image Sensor with a Column-Level Multiple-Ramp Single-Slope ADC", *IEEE International Solid-State Circuits Conference*, vol. L, pp. 506-507, Feb. 2007 - [6.2] P. Holloway, "A trimless 16b digital potentiometer", *ISSCC Dig. Tech. Papers*, pp. 66-67, Feb. 1984 # Conclusions In this thesis, improvements to the analog on-chip interface circuitry of CMOS image sensors have been investigated. These improvements have focused on two key aspects: the noise performance of the interface circuit, and the power efficiency of the circuit. In this final chapter, the main findings of this investigation are summarized. Furthermore, an overview of possible future work will be provided. #### 7.1 Main Findings #### **Regarding Noise Performance:** • The performance-limiting noise source in the analog readout circuit of a CMOS imager is the low-frequency (LF) noise (commonly known as 1/f noise) of the in-pixel source follower. Most of the known circuit techniques to reduce LF noise cannot be effectively applied here. In particular, while the application of correlated-double sampling (CDS) is essential in imagers to suppress offsets and reset noise, CDS is not fully effective in suppressing 1/f noise, as the sampling frequency is limited by the required charge transfer time in a pinned photodiode pixel (Chapter 3). - Apart from CDS, another circuit technique that could reduce LF noise in the front-end is switched-biasing or large-signal excitation (LSE). However, measurement results show that the application of LSE in a CMOS imager front-end does not lead to a decrease in LF noise of the front-end. The reason for this is the fact that LSE has to be combined with CDS in the front-end, as CDS is necessary to correct for offset and reset noise. As LSE can only be effectively applied to one of the two CDS samples, this leads to an unequal 'bias-history' of the two CDS samples. While this asymmetry could be resolved by applying LSE to the second CDS sample as well, the required switching might well corrupt the signal. (Chapter 3) - Even if the problem of LF noise increase due to the concurrent application of LSE and CDS can be solved, the LF noise measurements without application of CDS show only a modest improvement of 1.4dB on average. Based on this result, LF noise would remain the dominant noise source even if LSE can be successfully applied. (Chapter 3) - The use of a near minimum size in-pixel source follower leads to a large variation in LF noise from pixel to pixel. This can cause visible artifacts in an image, and might be the cause of some unexplained spatial noise phenomena observed in CMOS imagers. Moreover, the application of LSE can, in some instances, lead to an increased amount of LF noise, thus potentially increasing the variation in LF noise even further. (Chapter 3) #### Regarding power efficiency: - Most of the power that is required by the analog interface circuit is consumed by the A/D conversion; therefore, efforts to increase the power efficiency of the analog interface circuit should be focused on improving the ADC used in a CMOS imager. (Chapter 1 & 3) - For high-resolution imagers (> 3Mpixel) with a moderate frame rate, as used in mainstream applications, the column-level ADC architecture should provide the best power efficiency. In such an architecture, the best trade-off exists between the speed of the individual ADC channels and the total number of ADC channels that operate in parallel. (Chapter 4) - The massively-parallel single-slope ADC architecture allows for a column-level ADC with a robust and very simple column circuit, thereby minimizing required chip area and non-uniformity issues. However, its slow conversion speed is a bottleneck in imagers with a high pixel count (>3 megapixel) requiring a high (>10 bit) ADC resolution. (Chapter 4) - The multiple-ramp single-slope (MRSS) ADC can solve the main problem of the column-level single-slope ADC, i.e. its slow speed, while maintaining its key advantage of a simple column circuit. A prototype imager demonstrates a 3.3x faster conversion speed in comparison to a single-slope ADC. (Chapter 4 & 6). - The presence of photon shot noise in imager output signals can be exploited in the ADC of the imager to increase the power efficiency. While this exploitation can theoretically be used with any ADC, the effectiveness of this exploitation depends strongly on the ADC architecture. The MRSS ADC can be very well combined with the concept of photon shot noise exploitation, leading to the multiple-ramp multiple-slope (MRMS) ADC architecture. (Chapter 4 & 6) - Since the presence of column-to-column non-uniformities strongly reduces the perceptual image quality of an image sensor, mismatch between column-level ADC channels has to remain at least below 0.1%. This puts a severe design constraint on the column circuit, which may lead to higher power consumption and more chip area, as well as a lower yield. The dynamic column switching (DCS) technique relaxes these design constraints, by reducing the visibility of column-to-column non-uniformities, which can lead to reduced chip area and power consumption. The main limitation of this technique is that it translates column FPN in pixel FPN(Chapter 4 & 5). #### 7.2 Future Work #### 7.2.1 1/f Noise Reduction in CMOS Image Sensors In chapter 3 of this thesis, it was concluded that it is difficult to reduce 1/f noise in a CMOS image sensor using circuit techniques, as none of the known circuit techniques is effective in the readout circuit front-end. However, it might well be possible to reduce 1/f noise by improvements in processing technology, as 1/f noise is highly technology-dependent. Recently, an improved front-end circuit was published that uses an in-pixel depletion-mode transistor as source follower [7.1]. The reason for this lower 1/f noise is the fact that the current flows through the bulk of the silicon, where the density of lattice defects that cause trapping and de-trapping of charge carriers is lower than at the gate oxide interface. A further development of this concept is of the highest interest, as it would reduce the largest noise source in the readout circuit that is currently limiting performance. #### 7.2.2 Improvements to the MRSS/MRMS prototype While the imager with MRSS/MRMS ADC of chapter 6 is a proof-of-concept for these new ADC architectures, it has several typical shortcomings of a first silicon prototype. As a result, the full potential of the new ADC architectures has not yet been demonstrated. There are two main problems: the ADC resolution is limited to 10 bits, and the power consumption of the column-level comparators is too high. The limited resolution has a significant impact on both the MRSS and MRMS mode. For both architectures, it lowers the relative improvement compared to the classical single-slope ADC, as this improvement becomes increasingly larger for higher resolutions. This becomes clear if the results of chapter 6 are extrapolated to a resolution of 12 bits. In an MRSS ADC, a higher resolution impacts the length of the fine conversion phase only. For a 12-bit resolution, the fine conversion phase itself would take 511 clock periods instead of 127, plus an overlap of 308 instead of 77 clock periods (assuming here that to keep the overlap in voltage equal, the number of clocks multiplies by four because of the higher resolution). As a result, the total A/D conversion would take 925 clock periods, compared to 4095 clocks for a single-slope conversion. This is an improvement of 4.4x over a single-slope ADC, while the improvement is only 3.3x at 10 bits. In MRMS mode, the 10-bit resolution also impacts the effectiveness of applying a companding quantization scheme. As was illustrated in Figure 4-12, on page 97, companding is mainly effective when the initial resolution is higher than 10 bits. To further demonstrate the potential of the MRMS architecture, another extrapolation for an ADC design with 12 bits of resolution will be made. To this end, a companding quantization scheme with an initial resolution of 12 bits and approximately 1000 quantization steps is computed, in a similar fashion as was done in sub-section 6.4.3. This results in the scheme of Table 7-1. Table 7-1. Calculated number of quantization steps with r=0.12, $e_{sat}=25000$ , and initial resolution 12 bits: | steps (LSBs): | 1 | 2 | 4 | 8 | total: | |---------------|-----|-----|-----|-----|--------| | binary step | 142 | 212 | 424 | 230 | 1008 | Here, a typical saturation charge for the sensor of 25,000 electrons is assumed. The quality factor r for this scheme is 0.12, which is much better than for the scheme used for the MRMS prototype. Moreover, it requires only 1/4 the amount of quantization steps compared to linear quantization in an MRSS ADC. Therefore, when this scheme is implemented, it would take approximately the same conversion time as the 10-bit MRSS ADC, as the number of quantization steps is the same. This means that a 12-bit MRMS A/D conversion would only require about 310 clock periods, which is an improvement over a single-slope ADC with linear quantization of more than 12x, instead of the 4.1x achieved with the prototype of chapter 6. The relatively high power consumption of the column-level comparators leads to a relatively poor power efficiency of the ADC. However, these comparators were re-used from an existing ADC design, which suggests that the comparator requirements for an MRSS/MRMS ADC are similar to those in a single-slope ADC. Therefore, it should be possible to use more power efficient comparators used in other column-level single-slope ADCs to create a more power-efficient MRSS/ MRMS ADC. By comparing the measurement data of chapter 6 with published figures for power consumption of existing column-level single-slope ADCs, an estimate for the potential power consumption can be made. In [7.2], a CMOS imager with a column-level single-slope ADC is presented that has a total power consumption of 580mW, of which about 1/3 is used to drive the LVDS circuitry for digital I/O [7.3]. If the assumption is made that most of the remaining power is used for the analog column circuits, this yields a power consumption of about 400mW for all columns, or about 200µW per column, at a clock speed of 300MHz<sup>1</sup>. These results can be used to calculate a figure-of-merit (FOM) for the ADC, as is done in Table 7-2. | , 8 | | J | 1 | J | | |-------------------|---------|----------------|-----------------|----------------|----------------| | | Chpt. 5 | Chpt.6<br>MRSS | Chpt. 6<br>MRMS | Nitta<br>[7.2] | future<br>work | | power/col. (μW) | 5.7 | 95 | 95 | 200 | 500 | | clock speed (MHz) | 20 | 20 | 20 | 300 | 300 | | resolution (bits) | 10 | 10 | 10 | 12 | 12 | | clocks/conversion | 1023 | 310 | 246 | 4096 | 310 | | FOM (fJ/conv) | 285 | 1440 | 1140 | 660 | 125 | Table 7-2. Power efficiency figure-of-merit for the prototypes presented in this thesis, along with literature reference and possible future work In the table, the same definition for the figure of merit was used as in section 4.1: $$FOM = \frac{power}{f_c \cdot 2^{ENOB}} \tag{7-1}$$ Here, $f_s$ is the sampling frequency of the ADC and ENOB is the effective number of bits. In order to make a comparison between individual column ADCs, the total power consumption of the ramp generators is divided by the number of columns in the particular imager, and the result is added to the actual column power consumption. Since the prototype of chapter 5 does not have an on-chip ramp generator, measurement results from the multiple ramp generator of chapter 6 are used to estimate the required power here. As this generator consumes 1mW and drives 400 columns, the additional power consumption can be estimated to be $2.5\mu W$ , which leads to a FOM of 285fJ/conversion. Although this is a good result, it should be mentioned that the prototype measurement showed too much column FPN, and more comparator power might be required to reduce this. In the last column of the table, the power efficiency of a future ADC is estimated. This future ADC is assumed to be a combination of the column comparator used in the single-slope ADC design by Nitta [7.2] and the <sup>1.</sup> While this approximation could be rather inaccurate, unfortunately no more detailed power consumption data exist for this publication, as is the case for nearly all publications of ADCs in CMOS imagers, to the best of the author's knowledge. MRMS ADC architecture. To this end, the estimated power consumption of the multiple ramp generator has to be added to the power consumption of the comparator itself (neglecting the power consumption of the single ramp generator in [7.2]). This power consumption is linearly extrapolated from the measurement results of chapter 6. These measurement results showed that the multiple ramp generator requires 8mW for 400 columns at 20MHz. This is equivalent with 20μW/column at 20MHz, or 300μW/ column at 300MHz. As a result, the total power consumption is 500µW/ column. If this figure is combined with the MRMS architecture at 12 bit resolution, a 125fJ/conversion FOM should be achievable, which is more than 5x better than the published state-of-the-art. A further improvement can be achieved if the power consumption of the ramp generator can be reduced, for instance by a better ramp driver of which the power consumption depends on the capacitive load (rather than a design with fixed power consumption for worst-case capacitive load, as was done in the prototype). #### 7.2.3 Perceptual Effects of using a Companding ADC While the application of imager ADCs with a companding quantization scheme, i.e. the exploitation of photon shot noise output by imagers, has been known for several years, the perceptual effect on the human visual system has apparently not been studied in detail. Although measurement results make clear that a companding quantization scheme can be used to reduce power consumption, it is not clear how far the number of quantization steps can be reduced without introducing visible artifacts. In the calculation model of sub-section 4.3.2, this perceptual uncertainty is quantified by the quality parameter r, which stands for the ratio of quantization noise to photon shot noise. Therefore, research into the perceptual effects of a companding quantization scheme should yield information that can be used to establish acceptable values for r. #### 7.2.4 Perceptual Effects of Dynamic Column Switching A research question similar to the one posed in the last section exists for the application of dynamic column switching (DCS). Again, measurement results shown in this thesis demonstrate that DCS can be effective to reduce the visibility of column FPN. Although it is possible to use simple statistics to quantify the FPN reduction caused by DCS, as was done in chapter 5, the measured images of this prototype seem to indicate that the perceptual reduction is larger than the calculated reduction. Therefore, two research questions can be asked. Firstly, what is the acceptable amount of residual column FPN if DCS is applied? Secondly, how does the design and complexity of the switching matrix affect the visibility of the residual column FPN? An obvious problem of a study into the perceptual effects of DCS is the fact that these perceptual effects seem to strongly depend on the image itself. Therefore, different scenes and lighting conditions have to be considered in order to derive a better criterion for the amount of residual column FPN that is acceptable, and to find a switching matrix design that is an optimal trade-off between complexity and visibility of residual column FPN. #### 7.3 References - [7.1] Xinyang Wang, Padmakumar R. Rao and Albert J.P. Theuwissen "Characterization of Buried Channel n-MOST Source Followers in CMOS Image Sensors", accepted for the International Image Sensor Workshop (IISW), June 2007 - [7.2] Y. Nitta et al., "High-Speed Digital Double Sampling with Analog CDS on Column Parallel ADC Architecture for Low-Noise Active Pixel Sensor", IEEE International Solid-State Circuits Conference, vol. XLIX, pp. 500-501, Feb. 2006 - [7.3] Y. Nitta, private communication, Feb. 2006 # Companding Quantization Calculation Method In sub-section 4.3.1, the companding quantization scheme was introduced to exploit the presence of photon shot noise in imager signals to reduce the amount of quantization steps required in the ADC. As was mentioned in the sub-section, the following factors determining the amount of required quantization steps: - The saturation charge of the sensor $N_{sat}$ - The initial resolution of the ADC for small input signals n - The allowable ratio of quantization noise and photon shot noise, determined by the quality parameter *r*. As mentioned in sub-section 4.3.1, the quantization noise should be such that the following relation holds: $$r = \frac{\overline{e}_{qns}(k)}{\overline{e}_{phs}(N_{sig})} \tag{A-1}$$ Where $\overline{e}_{qns}(k)$ is the amount of quantization noise, depending on the integer step size k and $\overline{e}_{phs}(N_{sig})$ is the amount of photon shot noise, depending on the amount of signal electrons $N_{sig}$ . For convenience, both the quantization noise and the photon shot noise will be expressed as a number of electrons at the sensor input node in this appendix. The quantization noise of an ADC can be written as: $$\overline{e}_{qns}(k) = k \cdot \frac{e_{lsb}}{\sqrt{12}} \tag{A-2}$$ Where $e_{lsb}$ is the initial quantization step of the ADC, expressed in electrons at the sensor input. For a proper ADC design, the input range of the ADC should be matched to the saturation charge $N_{sat}$ of the imager, thus: $$N_{sat} = 2^n \cdot e_{lsh} \tag{A-3}$$ Finally, as is well known, the photon shot noise equals the square root of the signal charge: $$\bar{e}_{phs}(N_{sig}) = \sqrt{N_{sig}} \tag{A-4}$$ In order to enable a linear digital output, the quantization noise will be increased step-wise along with the input signal, while the photon shot noise increases continuously with input signal. As a result, the ratio of quantization noise to photon-shot noise will be maximal at the input signals for which the quantization step is increased. For such input signals, equation A-1 should still hold. Therefore, by combining this equation with equation A-2 and A-4, a required condition for the step increase of the quantization noise is derived: $$k \cdot \frac{e_{lsb}}{\sqrt{12}} = r \cdot \sqrt{N_{sig}} \tag{A-5}$$ By re-writing the above equation and applying Eq. (A-3), the following expression can be derived for the signal level $N_{sig}$ that is required to allow for a quantization step increase to k: $$N_{sig} = \left(\frac{N_{sat} \cdot k}{2^n \cdot \sqrt{12} \cdot r}\right)^2 \tag{A-6}$$ In order to get all transition points, this equation should be evaluated for increasing values for k until it yields an $N_{sig}$ exceeding $N_{sat}$ . As discussed in sub-section 4.3.1, there are two ways to increase the quantization noise step-wise. In case of a successive doubling of the quantization step, k should equal a power of two, i.e. k=1,2,4,8,16 etc. . For the slightly more complicated integer-wise increase, k should simply equal an integer number. For Table 4-3 and Figure 4-11 through Figure 4-13 in chapter 4, Matlab was used to compute the outcome of Eq. (A-6) and round the outcome. The maximum number for k, and thus the maximum quantization step, can also directly calculated entering $N_{sig} = N_{sat}$ into Eq. (A-6) and rewriting for k, which yields the following expression: $$k = \frac{2^n \cdot (\sqrt{12} \cdot r)}{\sqrt{N_{sat}}} \tag{A-7}$$ The maximum value for k that is computed with this equation should be rounded, either to the nearest power of two, or to the nearest integer, depending on whether a binary or integer quantization step scheme is used. | Companding Quan | tization Calcul | ation Method | |-----------------|-----------------|--------------| |-----------------|-----------------|--------------| # Summary This thesis describes the development of low-noise power-efficient analog interface circuitry for CMOS image sensors. It focuses on improving two aspects of the interface circuitry: firstly, lowering the noise in the front-end readout circuit, and secondly the realization of more power-efficient analog-to-digital converters (ADCs) that are capable of reading out high-resolution imaging arrays. Chapter 1 provides an introduction to the thesis, and starts with a short historical overview of solid-state image sensors. The first solid-state image sensors were developed in the early 1960s. The charge-coupled device (CCD) was invented in 1970, and this led, in the 1980s, to the first commercially available solid-state imagers. This success was mainly due to the fact that the CCD is a relatively simple device, making it relatively easy to produce. In the early 1990s, research efforts were made to realize an imager in CMOS technology with the objective of realizing both a sensor and readout circuitry on a single chip. These efforts led to the modern CMOS image sensor as we know it today. The challenge of designing CMOS imagers is that of optimizing three main parameters. First, the signal-to-noise ratio of each pixel output should be as high as possible. Second, the number of pixels should be as large as possible, partly because this is a very strong marketing argument in the consumer world. Third, the power consumption should be as low as possible. These performance parameters contradict each other, since an increase in the signal-to-noise ratio or the number of pixels of an imager generally leads to an increase in the power consumption of its analog readout circuitry. Therefore, the objective of this thesis is to provide system-level improvements of the readout circuit, which enable the simultaneous improvement of all three mentioned parameters. In chapter 2, an overview of analog signal processing in CMOS image sensors is given. A typical CMOS imager consists of a pixel array with in-pixel readout circuits, a set of column circuits next to the array, and a central chip-level circuit. The imaging array is read out in two steps: first, a row of pixels is read out and the results are stored in the column circuits; second, the column circuits are read out one-by-one by the chip-level circuit. In this thesis, the circuitry that performs the first operation is called the *front-end readout circuit*, while the circuitry that performs the second operation is called the *back-end readout circuit*. There are three photosensitive elements that can be used in an imager. The photodiode is the simplest, but suffers from kT/C or reset noise, which can only be corrected for by using a frame memory. To solve this problem, the photogate or the pinned photodiode can be used. The latter is most popular, since it typically has greater light sensitivity and lower dark current. The front-end readout circuit typically consists of an in-pixel source follower, combined with sample-and-hold circuitry and biasing in the column circuit. The back-end readout circuitry consists of a readout amplifier that reads the sampled voltages inside the column circuits, and an A/D converter. Section 2.5 discusses four classes of advanced signal-processing techniques that have been developed in recent years. First, the sharing of in-pixel circuitry between two or more pixels allows for smaller pixel sizes that still have an acceptable fill factor and sensitivity. Second, the reset noise of a photodiode can be reduced through the use of soft and active reset methods. Third, the dynamic range of a pixel can be increased by using several techniques. Finally, apart from a single chip-level ADC, parallelized column-level and even pixel-level ADCs have been developed. This development is further discussed in chapter 4. Chapter 3 discusses the front-end readout circuitry in detail. It shows that the noise of the front-end readout circuit is the performance limiting parameter. There are four significant noise sources in the front-end: photon shot noise, reset noise, thermal noise, and 1/f noise. Of these four sources, photon shot noise is the largest, but, since it is signal dependent, it is only dominant at large input signals. Reset noise is dominant in photodiode-based front-ends, but can be adequately suppressed through use of a pinned photodiode or photogate. Thermal noise can be well controlled, usually by adjusting the size of the sampling capacitors. Finally, when applying pinned photodiodes or photogates, the 1/f noise of the in-pixel source follower is typically dominant. Since conventional techniques for 1/f noise reduction are not applicable to an imager front-end, the possibility of using a relatively unknown technique, called Large-Signal Excitation (LSE) is investigated. This technique reduces 1/f noise by periodically switching a transistor 'off', by varying either its gate or source voltage, which in some semiconductor processes lead to a noise reduction of 8dB. Although a model explaining the 1/f noise reduction was recently introduced, it cannot predict the noise decrease without knowledge of the statistical properties of the so-called 'traps' inside the gate-oxide interface of the transistor. Therefore, a test chip was realized in order to evaluate the effectiveness of LSE in the small transistors typically used in a CMOS imager front-end. Unfortunately, measurements showed a 1/f noise reduction of only 1.4dB when LSE was applied; moreover, a noise increase was measured when a combination of LSE and correlated-double sampling (CDS) was used. Since the use of CDS is essential in imagers to remove offset and reset noise, it was concluded that LSE is unlikely to significantly reduce 1/f noise in the front-end. Chapter 4 discusses the application of column-level A/D converters in CMOS image sensors. First, chip-level, column-level and pixel-level ADC architectures are compared. It is shown that pixel-level ADCs are unsuitable for most applications because of the required chip area, while chip-level ADCs have to run at too high speeds to be power-efficient. Therefore, the column-level ADC is likely to be the best choice for high-resolution moderate frame-rate imagers. The most popular column-level ADC architecture, the single-slope ADC, is discussed in detail. Its main advantage is the small column circuit, but it has the disadvantage of a very slow conversion speed. To solve this problem, a new architecture is introduced: the multiple-ramp single-slope (MRSS) ADC. This architecture can have a significantly higher conversion speed than a single-slope ADC, while still having a very simple and small column circuit. All imager output signals contain photon shot noise, which is signal dependent. As a result, any A/D converter with linear quantization will have a higher performance than necessary for high input signals. Therefore, it is possible to exploit the presence of photon shot noise in order to lower power consumption of increase conversion speed of the ADC, since the number of quantization levels can be reduced without decreasing perceived image quality. The newly introduced MRSS ADC is very suited to do this, leading to a multiple-ramp multiple-slope (MRMS) ADC An important problem of all column-level ADCs is the fact that mismatch between columns is highly visible as column fixed-pattern noise (FPN). In order to relax the uniformity requirements of column ADC circuits, a new column FPN reduction technique, called dynamic column switching (DCS), is introduced. This technique reduces the perceptual effect of column FPN, by dynamically connecting each column ADC to several (adjacent) columns of the pixel array. As a result, a column FPN as large as 1% of full scale can be rendered invisible in simulation. Chapter 5 discusses the implementation of a CMOS image sensor with a low-power column-level single-slope ADC. It is the first imager that demonstrates the use of DCS to reduce the visibility of column-level non-uniformities. The imager is realized in a 0.18 $\mu$ m CMOS process and has 340 column-level ADCs with a resolution of 10 bits. The column-level comparator, which forms the core of a single-slope ADC, consists of two stages: a linear gain stage that is used as preamp, followed by a regenerative latch. The cancellation of comparator offset is done in two ways. First, an analog circuit auto-zero cancels the offset of the preamp. Second, a system-level auto-zero, which is essentially a second A/D conversion, compensates for the residual offset. The resulting comparator only consumes 3.2 $\mu$ W. Unfortunately, measurement results showed that the residual offset after analog auto-zero had a standard deviation of 1.6mV, which is much higher than anticipated. This is most likely caused by the higher than anticipated preamp gain. Measurements were performed to evaluate the proposed DCS technique. The measured column FPN without applying DCS was 0.69% of full scale ( $\sigma$ ); applying DCS reduced this to 0.41%. This result confirms the effectiveness of the proposed DCS technique. In chapter 6, an implementation of a CMOS image sensor with a column-level MRSS/MRMS ADC is presented. It is the world's first imager to use these ADC architectures. The image sensor is realized in a 0.25µm CMOS process and has 400 column-level ADCs with a resolution of 10 bits. The MRSS ADC uses 8 ramp voltages. The analog column circuit was re-used from an existing imager design; to this circuit, only 8 analog switches and some simple digital circuitry had to be added in order to implement the MRSS/MRMS architecture. A second critical part of an MRSS/ MRMS ADC is the multiple-ramp generator. In order to ensure matching between the ramps, a resistor ladder structure was used that consists of a single coarse ladder, to which 8 fine ladders were connected. An auto-calibration algorithm was used to compensate for the offsets of the various output buffers of the multiple ramp generator. Because of its flexible digital control, the ADC can operate in single-slope, MRSS, and MRMS mode, allowing for a direct comparison between the various modes. In single-slope mode, the A/D conversion time is 51.9μs, which results in a maximum frame rate of 50 frames/ second. In MRSS mode, an A/D conversion only takes 15.5μs, allowing for a frame rate of 142 frames/ second, while consuming only 16% more power. This is a 3.3x improvement in conversion time compared to single-slope mode, and thus underlines the potential of the MRSS to increase speed, and thus power efficiency of a column-level ADC. Finally, measurement in MRMS mode showed an A/D conversion time of 12.3μs, which is a reduction of 21% compared to MRSS mode. While this might seem less significant, it should be noted that the reduction will be larger in ADCs with more than 10-bit resolution. # Samenvatting Dit proefschrift beschrijft de ontwikkeling van analoge uitleescircuits voor CMOS beeldsensoren met een lage ruis en een efficiënt vermogenverbruik. Het concentreert zich op het verbeteren van twee parameters van dit circuit: ten eerste, het verminderen van ruis van het ingangscircuit, en ten tweede, het ontwikkelen van A/D (analoog-naardigitaal) omzetters met een verbeterde vermogensefficiëntie die geschikt zijn voor het uitlezen van pixelmatrices met een hoge resolutie. Hoofdstuk 1 geeft een inleiding tot het proefschrift en begint met een kort historisch overzicht van vaste-stof beeldsensoren. De eerste halfgeleider beeldsensoren werden in de vroege jaren 60 ontwikkeld. De uitvinding van het charge-coupled device (CCD) in 1970 leidde tot de eerste commercieel toegepaste halfgeleider beeldsensoren in de jaren 80. Dit succes was vooral te danken aan het feit dat de CCD een relatief eenvoudige component is, wat zijn fabricage vereenvoudigt. In de vroege jaren 90 werden onderzoeksinspanningen verricht om een beeldsensor in CMOS technologie te realiseren. Deze inspanningen hebben geleid tot de moderne CMOS beeldsensor van vandaag. De uitdaging van het ontwerpen van beeldsensoren is het optimalizeren van drie belangrijke parameters. Ten eerste dient de signaal-ruis verhouding van iedere pixeluitgang zo hoog mogelijk te zijn. Ten tweede moet het aantal pixels zoveel mogelijk vergroot worden, gedeeltelijk omdat dit een marketing argument is voor de consumentenelektronica. Ten derde dient het vermogenverbruik zo klein mogelijk te zijn. Deze drie parameters staan in tegenstelling tot elkaar, aangezien een vergroting van de signaal-ruis verhouding of de vergroting van het aantal pixels in het algemeen leidt tot een verhoogd vermogenverbruik van het analoge uitleescircuit. Op grond hiervan is het doel van dit proefschrift om systematische verbeteringen van het analoge uitleescircuit te vinden, die een gelijktijdige verbetering van alle drie genoemde parameters mogelijk maken. Hoofdstuk 2 geeft een overzicht van analoge signaalbewerking in CMOS beeldsensoren. Een typische CMOS beeldsensor heeft een pixelmatrix met uitleescircuits in iedere pixel, een set van kolomcircuits naast de matrix, en een chip-niveau uitleescircuit. De pixelmatrix wordt in twee stappen uitgelezen: eerst wordt een rij pixels uitgelezen, en de resultaten hiervan worden in de kolomcircuits opgeslagen; vervolgens worden de kolomcircuits een voor een door het centrale circuit uitgelezen. In dit proefschrift worden de circuits die de eerste operatie uitvoeren het *front-end uitleescircuit* genoemd en de circuits die de tweede operatie uitvoeren worden het *back-end uitleescircuit* genoemd. Er zijn drie lichtgevoelige componenten die in een beeldsensor gebruikt kunnen worden. De fotodiode is het simpelste, maar heeft last van kT/C ruis of resetruis. Dit probleem kan worden opgelost door het gebruik van de fotogate of de pinned fotodiode. Deze laatste is het meest populair, aangezien deze meestal een hogere lichtgevoeligheid met een lagere donkerstroom combineert. Het front-end uitleescircuit bestaat meestal uit een sourcevolger in de pixel, gecombineerd met bemonsterings- en biasingsschakelingen in het kolomcircuit. Het back-end uitleescircuit bestaat uit een uitleesversterker, die de bemonsterde spanningen in de kolomcircuits uitleest, en een A/D omzetter. Paragraaf 2.5 beschrijft vier geavanceerde signaalbewerkingstechnieken die in de laatste jaren zijn ontwikkeld. Ten eerste maakt het gemeenschappelijk gebruik van uitleescircuits tussen twee of meer pixels een kleinere pixel mogelijk, die toch een acceptabele gevoeligheid heeft. Ten tweede kan de resetruis van de fotodiode worden verminderd met zgn. zachte en actieve resettechnieken. Ten derde zijn er verschillende technieken ontwikkeld om het dynamische bereik van de pixel te vergroten. Tenslotte zijn er behalve een A/D omzetter op chipniveau, A/D omzetters op kolomniveau en zelfs pixelniveau ontwikkeld. Deze ontwikkeling wordt in detail behandeld in hoofdstuk 4. Hoofdstuk 3 beschrijft het front-end uitleescircuit in detail. Het toont aan dat de prestaties van het front-end uitleescircuit door zijn ruis worden begrensd. Er zijn vier significante ruisbronnen in het front-end: foton hagelruis (shot noise), resetruis, thermische ruis en 1/f ruis. Van deze vier is foton hagelruis het grootste, maar aangezien deze signaalafhankelijk is, domineert deze ruisbron alleen bij grote ingangssignalen. Resetruis is dominant bij het gebruik van fotodiodes; echter, deze ruisbron wordt door het toepassen van pinned fotodiodes of fotogates effectief onderdrukt. Thermische ruis kan goed worden beheerst, meestal door de grootte van de bemonsteringscapaciteiten aan te passen. 1/f ruis is meestal de dominante ruisbron bij front-ends gebaseerd op fotogates of pinned fotodiodes. Aangezien het niet mogelijk is om deze ruisbron met conventionele technieken te verminderen, werd een relatief onbekende techniek, nl. het toepassen van "grootsignaal biascondities" onderzocht. Deze techniek vermindert 1/f ruis door een transistor periodiek uit te schakelen, door ofwel de gatespanning te verlagen, ofwel de sourcespanning te verhogen. Dit leidt in sommige halfgeleiderprocessen tot een ruisvermindering van 8dB. Hoewel er recentelijk een model werd geïntroduceerd dat deze ruisvermindering kan verklaren, kan dit de ruisvermindering niet kwantificeren zonder gegevens over de statistische eigenschappen van de zgn. 'traps' in de gate-oxide overgang van een transistor. Daarom werd een geïntegreerde meetschakeling gerealiseerd om de effectiviteit van grootsignaal biascondities in kleine transistoren, zoals gebruikt in het front-end uitleescircuit, te evalueren. Helaas bleek uit meetresultaten dat de ruisvermindering door het toepassen van grootsignaal biascondities slechts 1.4dB is; bovendien werd een toename van de ruis gemeten wanneer het toepassen van grootsignaal biascondities werd gecombineerd met het toepassen van correlated-double sampling (CDS). Aangezien de toepassing van CDS essentieel is in beeldsensoren voor het verwijderen van offset en resetruis, werd geconcludeerd dat het onwaarschijnlijk is dat grootsignaal biascondities de 1/f ruis in het front-end significant kan verlagen. In hoofdstuk 4 wordt de toepassing van kolomniveau A/D omzetters in CMOS beeldsensoren beschreven. Allereerst worden chipniveau, kolomniveau en pixelniveau A/D omzetters vergeleken. Er wordt aangetoond dat pixelniveau ADC's voor de meeste toepassingen ongeschikt zijn vanwege de hoeveelheid benodigd chipoppervlak, terwijl chipnivau ADC's op een te grote snelheid moeten opereren om vermogensefficiënt te zijn. Daarom is de kolomniveau ADC in alle waarschijnlijkheid de beste keuze voor beeldsensoren met een hoge resolutie en een gematigde beeldfrequentie. De meest populaire kolomniveau ADC architectuur, de single-slope ADC, wordt in detail beschreven. Hoewel deze architectuur het voordeel heeft van een eenvoudig kolomcircuit, heeft zij ook het nadeel van een lage conversiesnelheid. Om dit probleem op te lossen wordt een nieuwe architectuur geïntroduceerd: de multiple-ramp single-slope (MRSS) ADC. Deze architectuur kan een significant lagere conversietijd hebben, terwijl zij nog steeds een simpel en klein kolomcircuit heeft. Alle uitgangssignalen van beeldsensoren bevatten foton hagelruis, hetgeen signaalafhankelijk is. Dit betekent dat een ADC met lineaire quantisatie voor grote ingangssignalen hogere prestaties levert dan noodzakelijk is. Daarom is het mogelijk om de aanwezigheid van foton hagelruis te gebruiken om een ADC met een lager vermogenverbruik of een hogere conversiesnelheid te realiseren, aangezien het aantal quantisatiestappen kan worden verminderd zonder dat de perceptuele beeldkwaliteit wordt verminderd. De in dit proefschrift geïntroduceerde MRSS ADC is hiervoor zeer geschikt, hetgeen leidt tot een multiple-ramp multiple-slope (MRMS) ADC. Een belangrijk probleem bij alle kolomniveau ADC's is het feit dat niet-uniformiteiten tussen kolommen zeer zichtbaar is als kolom vast-patroon of fixed-pattern ruis (FPN). Om de uniformiteiteisen die aan kolomcircuits worden gesteld te kunnen verminderen wordt een nieuwe camouflagetechniek, nl. dynamische kolomschakeling, geïntroduceerd. Deze techniek verminderd het perceptuele effect van kolom FPN door dynamisch iedere kolom ADC aan meerdere kolommen van de pixelmatrix te verbinden. Simulatieresultaten tonen aan dat dit een kolom FPN van 1% (van volle schaal) onzichtbaar kan maken. In hoofdstuk 5 wordt een implementatie van een CMOS beeldsensor met een kolomniveau single-slope ADC met laag vermogenverbruik besproken. Het is de eerste beeldsensor die gebruik maakt van dynamische kolomschakeling om de zichtbaarheid van kolomniveau niet-uniformiteiten te verminderen. De beeldsensor werd gerealiseerd in een 0.18µm CMOS proces en heeft 340 kolomniveau ADC's met een resolutie van 10 bits. De kolomniveau comparator, welke de kern van een single-slope ADC vormt, bestaat uit twee trappen: een lineaire versterkertrap die als voorversterker dient, gevolgd door een regeneratieve latch. Het compenseren van comparatoroffset gebeurt op twee manieren. Ten eerste compenseert een analoge auto-zero voor de offset van de voorversterker. Vervolgens wordt een systeemniveau auto-zero toegepast, hetgeen in feite een tweede A/D omzetting is, wat de rest van de offset verwijderd. De aldus gerealiseerde comparator gebruikt slechts 3.2µW. Meetresultaten toonden helaas aan dat de overgebleven offset na het toepassen van de analoge auto-zero een standaarddeviatie had van 1.6mV, hetgeen veel meer was dan verwacht. In alle waarschijnlijkheid wordt dit veroorzaakt door de versterkingsfactor van de voorsterker, welke hoger bleek te zijn dan voorzien. Er werden ook metingen verricht om de effectiviteit van dynamische kolomschakeling te evalueren. De gemeten kolom FPN zonder dynamische kolomschakeling was 0.69% van de volle schaal; het toepassen van dynamische kolomschakeling vermindert dit tot 0.41%. Dit resultaat bevestigt de effectiviteit van dynamische kolomschakeling. In hoofdstuk 6 wordt een beeldsensor met een kolomniveau MRSS/MRMS ADC gepresenteerd. Het is 's werelds eerste beeldsensor die deze ADC architecturen gebruikt. De beeldsensor is gerealiseerd in een 0.25µm CMOS proces en heeft 400 kolomniveau ADC's met een resolutie van 10 bits. De MRSS ADC gebruikt 8 rampspanningen. Het analoge kolomcircuit werd hergebruikt uit een bestaand ontwerp; aan dit circuit werden slechts 8 analoge schakelaars en enkele digitale circuits toegevoegd om de MRSS/MRMS architectuur te implementeren. Een tweede deel van een MRSS/MRMS ADC dat de prestaties bepaalt is de multiple-ramp generator. Om uniformiteit tussen de rampspanningen te garanderen werd een weerstandsladder-structuur gebruikt, bestaande uit een grove ladder, aan welke 8 fijne ladders werden verbonden. Om voor de offsets van de uitgangsbuffers te compenseren werd een auto-calibratie algorithme gebruikt. Vanwege de flexibele digitale besturing kan de ADC als single-slope, MRSS of MRMS ADC opereren, hetgeen een rechtstreekse vergelijking mogelijk maakt. In single-slope modus is de conversietijd van de ADC 51.9μs, hetgeen resulteert in een beeldfrequentie van maximaal 50 beelden/seconde. In MRSS modus is de conversietijd slechts 15.5μs, hetgeen resulteert in een beeldfrequentie van 142 beelden/ seconde, terwijl de toename in vermogenverbruik slechts 16% is. Dit is een verbetering van 3.3x t.o.v. single-slope modus, en onderstreept daarmee het potentieel van de MRSS ADC om de snelheid, en dus de vermogensefficiëntie, van een kolomniveau ADC te vergroten. Tenslotte lieten metingen in MRMS modus een conversietijd naar 12.3μs zien, hetgeen een vermindering is van 21% vergeleken met MRSS modus. Hoewel dit resultaat minder belangwekkend kan lijken, moet worden opgemerkt dat de vermindering van conversietijd groter kan zijn bij ADC's met een resolutie hoger dan 10 bits. # Acknowledgments Being a PhD student is often considered to be a long and lonely journey. While it might have been long in my case, it was not lonely, since on the other hand, I had the fortune of an extraordinarily number of people that 'travelled' along with me, and thereby in one way or another contributed to this thesis. Therefore, I am indebted to many persons, which I would like to acknowledge in this section. Unlike most other PhD students, I was fortunate to have not one, but two promotors. I am very grateful to Albert Theuwissen who taught me most things I know about (CMOS) imaging. His enthusiasm, optimism, and, above all, patience helped a great deal to make this thesis a reality. I am equally greatful to Han Huijsing who brought me into the Electronic Instrumentation Laboratory, introduced me to the field of analog interface electronics and inspired me to do a PhD study. My special thanks go to my fellow 'Crocodiles' and former roommates Michiel and Kofi. First of all, thanks for all the good advices over the years and for critical proofreading ('crocking') of my papers and this thesis. More importantly, thanks for being such great friends, thanks for bringing so much fun to the work place, and thanks for joining me on great trips to far-off conferences, crocodile farms and other weird places. I'd like to thank all the members of the Electronic Instrumentation Laboratory, who together create a very friendly and stimulating work atmosphere. Especially during times that my own family was going through a difficult period, you all have been like a second family to me. In particular, I want to thank Frerik & Paulo for being great friends, and particularly for all the fun we had on trips in the USA. Special thanks to Xinyang and Ning for proof-reading parts of my thesis. Thanks to Inge, Evelyn, Trudie, Helly and Willem for the administrative support that keeps the lab going, and Maureen, Jeroen, Piet, Ger, Harry and Jef for all the technical support. I want to thank Arnoud van der Wel and Erik Klumperink from the University of Twente for their cooperation in this Ph.D. project on LF noise. Their expertise in general, and the Arnoud's Ph.D. thesis in particular, greatly helped me in understanding LF noise phenomena that are of importance in imagers. Moreover, thanks to Arnoud for performing all the noise measurements mentioned in chapter 3 of this thesis. I'd like to thank the (former) CMOS imaging department of Philips Semiconductors in Eindhoven for their financial and practical support of this project. I'd like to thank all its members, in particular Adri Mierop, Jan van Geloven, Laurens Korthout, and Edwin Roks for many good advices and helpful tips over the years. The possibility of re-using a Philips imager design for the prototype described in chapter 5 was essential for this project to realize many ideas in silicon. My special thanks go to Adri Mierop, who served as a link between Philips and Delft University for many years, and who helped me with many practical design and measurement issues. I am very grateful to Paul Donegan, Matthias Sonder, Binqiao Li, Martin Kiik, Feng-Hua Feng, Shujuan Xie and Eric Fox of DALSA Corporation in Waterloo, Canada, for allowing me to re-use their imager design in order to create the prototype of chapter 6, and for giving me the opportunity to work on this design in Waterloo. Without your very-well documented reference design, it would never have been possible to realize the prototype MRSS ADC in such a short time. I would like to thank my family for their love and support throughout the years. I particularly thank my parents for stimulating me from an early age to think independently - this is essential to be able to do scientific work. Finally, I want to thank Olga for all her love, patience, understanding and support, especially during the busy months of writing this thesis. *Milujemt'a, ma Mila!* Martijn Snoeij Nürnberg, July 2007 # List of publications #### **Journal Papers** M. F. Snoeij, A. J. P. Theuwissen, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS imager with column-level ADC using dynamic column fixed-pattern noise reduction," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 3007 - 3015, December 2006. A.P. van der Wel, E. Klumperink, J. Kolhatkar, E. Hoekstra, M.F. Snoeij, C. Salm, H. Wallinga and B. Nauta, "Low Frequency Noise Phenomena in Switched MOSFETs", *IEEE Journal of Solid-State Circuits*, vol. 42, no. 3, pp. 540-550, March 2007 #### **Conference Papers** M.F. Snoeij, A.J.P. Theuwissen, J.H. Huijsing, and K.A.A. Makinwa, "Power and Area Efficient Column-Parallel ADC Architectures for CMOS Image Sensors", to be presented at IEEE Sensors 2007, Oct. 2007 (invited paper) M.F. Snoeij, P. Donegan, A.J.P. Theuwissen, K.A.A. Makinwa, and J.H. Huijsing, "A CMOS Image Sensor with a Column-Level Multiple-Ramp Single-Slope ADC", *IEEE International Solid-State Circuits Conference*, vol. L, pp. 506-507, Feb. 2007 M.F. Snoeij, A.J.P. Theuwissen, K.A.A. Makinwa, and J.H. Huijsing, "Column-Parallel Single-Slope ADCs for CMOS Image Sensors", *Proc. Eurosensors XX*,vol II, pp. 284-287, Sept. 2006 (invited paper) M. F. Snoeij, A. Theuwissen, K. Makinwa, and J. H. Huijsing, "A CMOS imager with column-level ADC using dynamic column FPN reduction," *IEEE International Solid-State Circuits Conference*, vol. XLIX, pp. 498 - 499, Feb. 2006. - M.F. Snoeij, A.J.P. Theuwissen, and J.H. Huijsing, "A low-power Column-Parallel 12-bit ADC for CMOS Imagers", *IEEE Workshop on CCDs and Advanced Image sensors 2005*, pp. 169-172, Karuizawa, Japan, June 2005 - M.F. Snoeij, A.P. van der Wel, A.J.P. Theuwissen and J.H. Huijsing, "The Effect of Switched Biasing on *1/f* Noise in CMOS Imager Front-Ends", *IEEE Workshop on CCDs and Advanced Image Sensors*, pp. 68-71, Karuizawa, Japan, June 2005 - M.F. Snoeij, A.J.P. Theuwissen, J.H. Huijsing, "A 1.8V 3.2?W Comparator for Use in a CMOS Imager Column-Level Single-Slope ADC", *Proc. IEEE ISCAS*, pp. 6162-6265, May 2005 - M.F. Snoeij, A. Theuwissen, and J.H. Huijsing, "Read-Out Circuits for Fixed-Pattern Noise Reduction in a CMOS Active Pixel Sensor", *proceedings of SeSens*, Nov. 2002 ### About the Author Martijn Snoeij was born in Zaandam, The Netherlands in 1977. From 1995, he studied Electrical Engineering at Delft University of Technology, where he received his M.Sc. degree (cum laude) in 2001. The title of his MSc thesis was "A Low Power Sigma-Delta ADC for a Digital Microphone". In 2001, he started working towards a PhD degree at the Electronic Instrumentation Laboratory of Delft University. The subject of his research was the analog interface circuitry for CMOS image sensors, and resulted in this thesis. From August to December 2000, he did an internship at National Semiconductor, Santa Clara, California, where he worked on precision comparators and amplifiers. In 2006, he was a co-recipient of the ISSCC Jan van Vessem award for outstanding European paper. In March 2007, he moved to Erlangen, Germany, where he is currently working as an analog circuit design engineer at Texas Instruments. His professional interests include analog and mixed-signal electronics and sensors.