



# Design of Front-End Receiver Electronics for 3D Trans-Esophageal Echocardiography

by

Anirban Saha

born in Agartala, India

Supervisors: Dr. M.A.P. Pertijs Dr. Z. Yu

A Thesis submitted to the Office of Graduate Studies of the Delft University of Technology In partial fulfilment of the requirements for the degree of

MASTER OF SCIENCE

in

Faculty of Electrical Engineering, Mathematics and Computer Science (Department of Microelectronics)



## **Committee members**

Dr. M.A.P. Pertijs (TU Delft: Electronic Instrumentation Laboratory)Dr. A. Bossche (TU Delft: Electronic Instrumentation Laboratory)Dr. J.G. Bosch (Erasmus MC: Thoraxcenter)Dr. Z. Yu (TU Delft: Electronic Instrumentation Laboratory)

Dedicated to the loving memory of my beloved father Nepal Saha (1950–1998)

## Abstract

The motivation behind this thesis is that cardio-vascular diseases claim the highest number of lives each year globally. In order to enhance the accuracy in diagnosis, construction of 3D images of the heart is required. From these images, precise information can be obtained regarding the 3D anatomy of the heart and its functioning. Trans-Esophageal Echocardiography (TEE) is a promising technique to achieve this kind of precision in diagnosis.

The objective of my thesis is to perform optimization at the system and circuit level in order to improve power-efficiency and area-efficiency of the front-end receiver electronics. This electronic circuitry is integrated at the tip of a miniature TEE probe, which will be inserted through the esophagus close to the heart of the patient for diagnosis (via a gastroscopic tube). The signal processing chain of the receiver electronics consists of a Low Noise Amplifier (LNA), a Micro-Beamformer that performs delay and sum operation, and a Time Gain Compensation (TGC) amplifier.

A novel low-power high-dynamic-range micro-beamformer is designed in TSMC 0.18  $\mu$ m CMOS technology. The dynamic range is enhanced substantially compared to a previous implementation. This has been achieved by employing an Offset Calibration Loop (OCL). The proposed design is power-efficient such that the total power consumption is almost a factor of 5 lower compared to the state-of-the-art design. The increase in thermal noise level is marginal (6.5%) after incorporating the OCL.

## Acknowledgements

First and foremost, I would like to express my deepest gratitude to my thesis advisor Dr. Michiel Pertijs, for his guidance and support throughout the duration of the thesis project. He was always available for the discussion of new ideas and these discussions with him have definitely molded my ability to solve problems in a structured manner. It was really helpful for me that he would provide me that extra push whenever I slacked. I am thankful to him for his counseling on personal matters and I cannot thank him enough for going out of his way for me, time and again.

I would also like to thank Dr. Zili Yu for sharing all her knowledge and experience with me during our technical discussions. Her wealth of knowledge in this field is really commendable and something I have always strived to achieve. Among the senior Ph.D students in the research group, I am thankful to Ruimin and Zhichao for their helpful insights during group meetings and design reviews.

I would like to express my sincere gratitude to Dr. Andre Bossche and Dr. Hans Bosch for serving on my thesis defense committee.

I am thankful to Oldelft Ultrasound B.V., for providing the financial support for this thesis project.

I was really fortunate to have a good circle of friends at my workplace and my home. I would like to thank Ajit, Chao, Nandish and Lokesh in my lab for their assistance from time to time. A special thanks to Nandish for his helpful insights during our technical discussions. I would like to express my gratitude towards my room-mates Dhariyash, Vishwas, Gaurav and Nishant, who always created a very pleasant and relaxing atmosphere at home. I doubt I could have successfully completed my thesis without all their help and support, especially during times of high pressure.

I cannot thank enough the love of my life, my girlfriend, Piyali Dey, who in spite of being busy with pursuing her own Ph.D degree in US, always provided me that mental support. She always inspired me and showed confidence in my ability. Thank you Piyali, for always being there for me when I needed you. I hope to make it up to you. Most importantly, I would like to thank you for keeping me sane during the tough times and filling my life with so much joy.

Lastly and most importantly, I would like to thank my family, especially my mother. Whatever I have achieved in my life, I owe it to her. She never made me aware of all the sacrifices she made and the hardships she faced for me. I would also like to thank my elder sister for her guidance from time to time.

## **Table of Contents**

| Abstract                                                                  | iv  |
|---------------------------------------------------------------------------|-----|
| Acknowledgements                                                          | v   |
| Table of Contents                                                         | vii |
| List of Figures & Flowcharts                                              | x   |
| List of Tables                                                            | xiv |
| 1. Introduction                                                           | 1   |
| 1.1 Motivation                                                            | 1   |
| 1.2 Basic Principles                                                      | 2   |
| 1.2.1 Trans-Esophageal Echocardiography                                   |     |
| 1.2.2 Properties of Ultrasound Signals                                    | 6   |
| 1.3 Design Challenges                                                     | 7   |
| 1.4 Objectives                                                            |     |
| 1.5 Organization of the Thesis                                            | 9   |
| References                                                                | 9   |
| 2. Overview of Existing Design                                            | 11  |
| 2.1 Overview of Signal Processing Chain of Front-End Receiver Electronics | 11  |
| 2.2 Low Noise Amplifier (LNA)                                             |     |
| 2.3 Time Gain Compensation Amplifier                                      | 14  |
| 2.4 Micro-Beamformer                                                      | 16  |
| 2.5 Evaluation of Power Consumption                                       |     |
| 2.6 Conclusions                                                           |     |
| References                                                                |     |
| 3. Architecture-Level Design                                              |     |
| 3.1 Limitations of Existing Micro-Beamformer                              |     |
| 3.1.1 Precision Considerations in Existing Differential Implementation    |     |
| 3.1.2 Limited Output Drive Capability                                     | 30  |

| 3.1.3 Sensitivity to Parasitic Capacitances                                 |    |
|-----------------------------------------------------------------------------|----|
| 3.2 Micro-Beamformer Based on Active Charge Mode Summation                  | 32 |
| 3.2.1 Stray-Insensitive Delay Line Based on Bottom-Plate Sampling           | 32 |
| 3.2.2 Signal Summation Using Charge Amplifier                               |    |
| 3.2.3 Possibility of Converting Charge Amplifier into TGC Amplifier         |    |
| 3.2.4 Dynamic Range Limitations in the Proposed Single-Ended Implementation |    |
| 3.3 Enhancement of Dynamic Range Using an Offset Calibration Loop           | 39 |
| 3.3.1 Clocking Scheme                                                       | 49 |
| 3.3.2 Noise Analysis                                                        | 51 |
| 3.4 Conclusions                                                             | 61 |
| References                                                                  | 61 |
| 4. Transistor-Level Design                                                  | 62 |
| 4.1 Overview of Individual Blocks                                           | 62 |
| 4.2 DTMOS Based LNA                                                         | 63 |
| 4.3 Implementation of Stray-Insensitive Delay Line                          | 67 |
| 4.3.1 Timing Considerations                                                 | 68 |
| 4.3.2 Implementation of Switches                                            | 69 |
| 4.4 Single-Ended Main Gm Stage                                              | 71 |
| 4.4.1 Choice of Single-Ended Gm Stage Over Differential Configuration       |    |
| 4.4.2 Auto-Zeroing Loop                                                     |    |
| 4.4.3 Trade-off Between Output Swing and Input-Referred Noise               | 81 |
| 4.5 Offset Calibration Loop                                                 | 82 |
| 4.5.1 Transistor Level Realization of Gm Stages                             | 82 |
| 4.5.2 Implementation of Switches and Sizing of Capacitors                   | 83 |
| 4.5.3 Addition of Buffer in Each Branch of Delay Line in OCL                | 85 |
| 4.6 Overall Clocking Scheme                                                 | 85 |
| 4.7 Conclusions                                                             |    |
| References                                                                  | 88 |
| 5. Simulation Results                                                       | 89 |
| 5.1 Transient Analysis                                                      |    |
| 5.1.1 Calibration Phase and Normal Operation                                | 89 |
| 5.1.2 Offset at Output With and Without OCL                                 |    |
|                                                                             |    |

| 5.1.3 Illustration of High Dynamic Range of Micro-Beamformer |     |
|--------------------------------------------------------------|-----|
| 5.2 PSS and PNoise Analysis                                  |     |
| 5.3 Power Consumption                                        |     |
| 5.4 Conclusions                                              |     |
| References                                                   |     |
| 6. Conclusions                                               | 101 |
| 6.1 Summary                                                  |     |
| 6.2 Main Contributions                                       |     |
| 6.3 Scope for Future Work                                    |     |
| 6.3.1 Guidelines for Layout                                  |     |
| References                                                   |     |
| Appendix A: Derivation of Unity-Gain Bandwidth of AZ Loop    |     |
| Appendix B: Simulation Results – AC and Stability Analysis   |     |
| B1. AC Analysis                                              |     |
| B2. Stability Analysis                                       | 109 |
| Appendix C: Effect of OCL on Dynamic Range – A Case Study    |     |

## **List of Figures & Flowcharts**

| Fig. 1.1. Volumetric dataset acquired using a matrix transducer: (a) front-view of a matrix            |
|--------------------------------------------------------------------------------------------------------|
| transducer (courtesy of Oldelft Ultrasound B.V.), (b) a volume can be imaged by a matrix               |
| transducer [1.3]                                                                                       |
| Fig. 1.2 (a) A TEE probe of the type 171Z- [1.4] (courtesy of Oldelft Ultrasound B.V.), (b)            |
| Insertion of TEE probe through the esophagus of patient during TEE Imaging                             |
| Fig. 1.3 Working principle of an ultrasound transducer: (a) transmit mode, (b) receive mode [1.3]      |
|                                                                                                        |
| Fig. 1.4 A TEE Imaging system [1.3]                                                                    |
| Fig. 1.5 Calculation of depth of tissue based on pulse-echo principle [1.6]                            |
| Fig. 1.6. Illustration of dynamic range of the signal in receive mode [1.3]7                           |
| Fig. 2.1 Block diagram of a conventional receive-signal processing flow for ultrasound array           |
| transducers [2.1]                                                                                      |
| Fig. 2.2 Receive signal processing architecture for 3D TEE (N=9 and M=225) [2.1]12                     |
| Fig. 2.3 Common-source amplifier a load resistor: (a) NMOS implementation, and (b) PMOS                |
| implementation [2.1]                                                                                   |
| Fig. 2.4 Dynamic range of the ultrasound receiver system: (a) dynamic range at the input of the        |
| TGC amplifiers, (b) ideal TGC scheme, (c) output dynamic range after ideal TGC, (d) four-step          |
| TGC scheme, and (e) output dynamic range after four-step compensation [2.1]                            |
| Fig. 2.5 Simplified schematic of the TGC amplifier [2.1]                                               |
| Fig. 2.6 Implementation of Digital beamforming [2.1] 17                                                |
| Fig. 2.7. Pipeline-operated S/H delay line [2.1]                                                       |
| Fig. 2.8 A pipeline operated S/H delay line with charge mode summation [2.1]                           |
| Fig. 2.9. Architecture of the proposed receive signal processing chain                                 |
| Fig. 3.1. Charge injection error in (a) single-ended S/H circuit, and (b) differential S/H circuit. 27 |
| Fig. 3.2. Clock feed-through error in (a) single-ended S/H circuit, and (b) differential S/H circuit   |
|                                                                                                        |
| Fig. 3.3. Front-end receive signal processing chain                                                    |
| Fig. 3.4. Addition of a buffer stage after delay line in each branch in the signal processing chain    |
|                                                                                                        |

| Fig. 3.5. Schematic of the delay line showing all the stray capacitances                                                                                                                                                    |  |  |  |  |                                                                                               |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|-----------------------------------------------------------------------------------------------|
| <ul> <li>Fig. 3.6. Stray-insensitive delay line (a) Schematic showing all the stray capacitors and (b) Clock signals</li> <li>Fig. 3.7. Schematic depicting active charge mode summation at virtual ground of OTA</li></ul> |  |  |  |  |                                                                                               |
|                                                                                                                                                                                                                             |  |  |  |  | micro-beamformer a DC signal is applied                                                       |
|                                                                                                                                                                                                                             |  |  |  |  | Flowchart 3.1. Need for an Offset Calibration Loop                                            |
|                                                                                                                                                                                                                             |  |  |  |  | Flowchart 3.2. Working principle of the Offset Calibration Loop                               |
|                                                                                                                                                                                                                             |  |  |  |  | Fig. 3.10. Front-end receive signal processing chain including the Offset Calibration Loop 41 |
| Fig. 3.11. Operation of the Offset Calibration Loop: Phase 1. Calibration, Sub-phase 1: Reset . 45                                                                                                                          |  |  |  |  |                                                                                               |
| Fig. 3.12. Operation of the Offset Calibration Loop: Phase 1. Calibration, Sub-phase 2: Read-out                                                                                                                            |  |  |  |  |                                                                                               |
|                                                                                                                                                                                                                             |  |  |  |  |                                                                                               |
| Fig. 3.13. Operation of the Offset Calibration Loop: Phase 2. Normal Operation, Sub-phase 1:                                                                                                                                |  |  |  |  |                                                                                               |
| Reset                                                                                                                                                                                                                       |  |  |  |  |                                                                                               |
| Fig. 3.14. Operation of the Offset Calibration Loop: Phase 2. Normal Operation, Sub-phase 2:                                                                                                                                |  |  |  |  |                                                                                               |
| Read-out                                                                                                                                                                                                                    |  |  |  |  |                                                                                               |
| Fig. 3.15. Waveforms of the clock signals. The calibration window is 400 ns                                                                                                                                                 |  |  |  |  |                                                                                               |
| Fig. 3.16. (a) Simplified schematic of the switched-capacitor charge amplifier, and (b) Clock                                                                                                                               |  |  |  |  |                                                                                               |
| Signals                                                                                                                                                                                                                     |  |  |  |  |                                                                                               |
| Fig. 3.17. Schematic of the charge amplifier during sampling phase. The noise sources are                                                                                                                                   |  |  |  |  |                                                                                               |
| shown                                                                                                                                                                                                                       |  |  |  |  |                                                                                               |
| Fig. 3.18. Schematic of the charge amplifier during read-out phase. The noise sources are shown.                                                                                                                            |  |  |  |  |                                                                                               |
|                                                                                                                                                                                                                             |  |  |  |  |                                                                                               |
| Fig. 3.19. A simplified implementation of a single-ended OTA                                                                                                                                                                |  |  |  |  |                                                                                               |
| Fig. 4.1. Block level diagram of the front-end receive signal processing chain                                                                                                                                              |  |  |  |  |                                                                                               |
| Fig. 4.2 DTMOS transistor (a) Cross-section and (b) Equivalent transistor level symbol                                                                                                                                      |  |  |  |  |                                                                                               |
| Fig. 4.3. LNA implemented using DTMOS transistor                                                                                                                                                                            |  |  |  |  |                                                                                               |
| Fig. 4.4. DTMOS based LNA with load                                                                                                                                                                                         |  |  |  |  |                                                                                               |
| Fig. 4.5. Stray-Insensitive Delay Line using Ideal switches                                                                                                                                                                 |  |  |  |  |                                                                                               |
| Fig. 4.6. Transistor level implementation of the Stray-Insensitive Delay Line                                                                                                                                               |  |  |  |  |                                                                                               |

| Fig. 4.7. Waveforms of the Clock signals                                                                        |
|-----------------------------------------------------------------------------------------------------------------|
| Fig. 4.8. Diagram illustrating reduction of clock feed-through using CMOS switch                                |
| Fig. 4.9. Single-ended <i>Gm</i> stage                                                                          |
| Fig. 4.10. (a) A single-ended charge amplifier, and (b) Realization of its <i>gm</i> block                      |
| Fig. 4.11. (a) A fully differential charge amplifier, and (b) Realization of its <i>gm</i> block                |
| Fig. 4.12. Switched-capacitor charge amplifier along with the AZ loop. The loop is shown in a                   |
| dashed line                                                                                                     |
| Fig. 4.13. Operation of AZ loop, Phase I: Auto-zeroing                                                          |
| Fig. 4.14. Operation of AZ loop, Phase II: Read-out from sampling capacitor of delay line 78                    |
| Fig. 4.15. Capacitive attenuation at gate of <i>M</i> 179                                                       |
| Fig. 4.16. Schematic of OTA1_AZL used in AZ loop                                                                |
| Fig. 4.17. Expanded version of Fig. 4.9. showing the PMOS current mirror transistors                            |
| Fig. 4.18. Front-end receive signal processing chain including the OCL and AZ loop                              |
| Fig. 4.19. Addition of PMOS buffer in each branch of delay line in OCL                                          |
| Fig. 4.20. Overall clocking scheme. The calibration window is 400 ns                                            |
| Fig. 5.1. Calibration phase and Normal phase operation. (a) Clock signal and input and output                   |
| waveforms (b) Zoomed-in version showing the type of signals at the output of main OTA and at                    |
| the output of 2-phase sample-and-hold block, (c) Zoomed-in version showing one cycle of the                     |
| input signal. For this simulation, there is 20% mismatch both in the input delay line and the delay             |
| line in OCL                                                                                                     |
| Fig. 5.2. Output of Main OTA as function of time for 3 different cases. (a) The time window is                  |
| 11 $\mu s,$ which includes the calibration period of 10 $\mu s,$ and (b) The time window is 1 $\mu s,$ in which |
| the initial settling behavior is depicted                                                                       |
| Fig. 5.3. Output of main OTA (zoomed-in) as function of time for three different cases                          |
| Fig. 5.4 Dynamic Range plot of the Micro-beamformer. The ripple levels and the noise floor are                  |
| indicated in figure                                                                                             |
| Fig. 5.5. Results from PSS and PNoise analysis showing the total input-referred noise and the top               |
| contributors of noise. There is no OCL in this case                                                             |
| Fig. 5.6. Results from PSS and PNoise analysis showing the total input-referred noise and the top               |
| contributors of noise. In this case, an OCL is employed which increases the noise level                         |
| marginally                                                                                                      |

| Fig. A1. Operation of AZ loop, Phase I: Auto-zeroing. The loop is broken at node A which is the |
|-------------------------------------------------------------------------------------------------|
| input of OTA1_AZL                                                                               |
| Fig. A2. The schematic of AZ loop after the loop is broken at node A and the OTAs are replaced  |
| with their small-signal equivalent models                                                       |
| Fig. B1. Open-loop gain of main OTA – Magnitude and Phase plots                                 |
| Fig. B2. Closed-loop gain of switched-capacitor charge amplifier - Magnitude and Phase plots    |
|                                                                                                 |
| Fig. B3. Loop gain during charge transfer phase – Magnitude and Phase plots 109                 |
| Fig. B4. Loop gain of AZ loop – Magnitude and Phase plots 110                                   |
| Fig. B5. Loop gain of OCL – Magnitude and Phase plots                                           |
| Fig. B6. Loop gain of OCL – Magnitude and Phase plots for different values of $gm1$             |

## List of Tables

| Table 2.1 Break-down of power consumption of individual blocks of existing design | 21    |
|-----------------------------------------------------------------------------------|-------|
| Table 2.2 Break-down of power consumption of individual blocks of proposed design | 23    |
| Table 3.1 Variation of peak-to-peak ripple value with $g_{m_OTA1_OCL}$            | 60    |
| Table 3.2. Summary of the design parameters                                       | 60    |
| Table 4.1. Summary of design parameters of DTMOS based LNA                        | 67    |
| Table 4.2. Summary of design parameters of AZ loop                                | 80    |
| Table 5.1. Power consumption of individual blocks in the proposed design          | 99    |
| of receive front-end signal processing chain                                      | 99    |
| Table C1. Peak-to-peak ripple values for different scenarios                      | . 113 |

### **Chapter 1**

### Introduction

In this introductory chapter, the motivation behind this thesis is described first. Subsequently, the basic principles of ultrasonic imaging and properties of ultrasound signal are explained in the following section. The design challenges required to meet the target specifications in this work are then described, followed by a discussion regarding the objectives of the thesis project. Finally, the structure of the thesis is presented.

#### **1.1 Motivation**

Cardiovascular diseases (CVDs) refer to a class of disorders of the heart and blood vessels. According to a survey by the World Health Organization (WHO) in 2011, CVDs claim the highest number of lives every year globally [1.1]. In 2008, CVDs were alone responsible for the deaths of an estimated 17.3 million people, representing 30% of all global deaths. This number is expected to go up to 23.6 million people by the year 2030. Therefore, accuracy in diagnosis is of utmost importance. In order to obtain visual images of the inner structures of the heart, many imaging techniques have been developed like magnetic resonance imaging (MRI), computed tomography (CT), nuclear imaging and echocardiography. In recent times, echocardiography has emerged as the most popular technique because of its low cost, non-invasive nature and high resolution imaging capability [1.2].

There are two types of echocardiography – Trans-Thoracic Echocardiography (TTE) and Trans-Esophageal Echocardiography (TEE). In TEE, esophagus is used as the imaging window to the heart [1.3]. Because of the proximity of the heart to the wall of the esophagus, it is possible to obtain images without strong attenuation from the ribs or the lungs, which is the downside of TTE.

In 3D TEE, a 2D matrix array is used, in which ultrasonic beams are emitted in multiple directions by all the individual transducer elements to enclose a pyramidal volume (as depicted in Fig. 1.1). This facilitates acquisition of a 3D dataset, thereby, obviating mechanical transposition of the array, which is the main problem when a 1D array is used [1.3]. In order to facilitate 3D TEE, research groups from Oldelft Ultrasound B.V., Erasmus MC and TU Delft have been collaborating to develop a miniaturized ultrasound probe comprising of a matrix piezoelectric transducer with more than 2000 elements.



Fig. 1.1. Volumetric dataset acquired using a matrix transducer: (a) front-view of a matrix transducer (courtesy of Oldelft Ultrasound B.V.), (b) a volume can be imaged by a matrix transducer [1.3]

At the tip of the 3D TEE probe, smart signal processing is required to reduce the number of cables (connecting transducer elements to an external imaging system) that can be accommodated inside the gastroscopic tube. The transmit transducer is fully wired to the transmit electronic circuit. The receive transducer and its accompanying receive electronic circuitry is integrated at the tip of the probe. The design of the front-end receive ASIC is the focus of this thesis project. The emphasis is on making the design power-and area-efficient.

#### **1.2 Basic Principles**

In order to arrive at the target specifications for the signal-processing circuitry of an ultrasound receive ASIC, a thorough understanding of the basic principles of TEE is required. These are

explained in sub-section 1.2.1. Subsequently, the properties of ultrasound signal are discussed in the following sub-section.



#### 1.2.1 Trans-Esophageal Echocardiography

Fig. 1.2 (a) A TEE probe of the type 171Z- [1.4] (courtesy of Oldelft Ultrasound B.V.), (b) Insertion of TEE probe through the esophagus of patient during TEE Imaging

Trans-Esophageal Echocardiography (TEE) is a cardiac imaging technique in which a small probe is inserted into the esophagus of the patient (as shown in Fig. 1.2 (b)). At the tip of the probe, a matrix transducer can be located, as indicated in Fig. 1.2. (a). A gastroscopic tube provides the connection between the ultrasound transducer and the external imaging system. Using this technique, it is possible to obtain high resolution images of the heart because attenuation from the ribs, lungs and subcutaneous tissue is easily avoided, which is the main problem in transthoracic echocardiography (TTE) [1.5].

#### **Piezoelectric Ultrasound Transducers**

A piezoelectric ultrasound transducer is a device which can transform electrical energy into acoustic energy and vice versa. When an external voltage is applied across the two electrodes of the transducer, it starts oscillating at high frequencies, thereby, generating ultrasonic sound waves (as depicted in Fig. 1.3 (a)). On the other hand, when an ultrasound echo signal impinges on the surface of the transducer, it exerts pressure resulting in displacement of electrical charges and creation of dipoles. As a result, a potential difference is generated across the electrodes of the transducer (shown in Fig. 1.3 (b)). Therefore, an ultrasound transducer is capable of playing

the dual role of a transmitter and a receiver, as required in ultrasound imaging systems. These transducers can operate in a wide range of frequencies from kHz to MHz. Lead-zirconate-titanate (PZT) ceramic is the most widely used material for ultrasound transducers [1.3].



Fig. 1.3 Working principle of an ultrasound transducer: (a) transmit mode, (b) receive mode [1.3]

#### **TEE Imaging Systems**



Fig. 1.4 A TEE Imaging system [1.3]

A TEE Imaging system, as shown in Fig. 1.4, is based on the pulse-echo principle. According to this principle, the distance between the probe and the structure that resulted in the echo can be computed by measuring the time elapsed between transmission of a pulse and arrival of a given

echo [1.6]. The relation between the depth of the reflector and the arrival time of an echo is given by:

$$d = (c \times t)/2 \tag{1.1}$$

where d is the depth of the structure resulting in the echo, c is the speed of propagation of sound in human tissues, and t is the arrival time of the echo.



Fig. 1.5 Calculation of depth of tissue based on pulse-echo principle [1.6]

The components of a TEE Imaging system are transmit (Tx) electronics, receive (Rx) electronics, one or more ultrasound transducers consisting of an array of elements, a control and signal-processing module and a display module. The high voltage pulses generated by the Tx electronics provide excitation to the transducer elements. This electrical energy is transformed into mechanical energy of the ultrasound waves. A certain percentage of these acoustic waves will be reflected back depending on the acoustic impedance of the materials they encounter in the path of their propagation. When these reflected signals or echoes impinge upon the surface of the transducer elements, their mechanical energy is converted into electrical signals. The typical voltage levels of these signals are in the range of tens of microvolts to hundreds of millivolts. Subsequently, these signals are processed by the Rx electronics, which consists of amplification, time-gain-compensation and coherent delay-and-sum operation [1.3]. The 3D anatomy of the

heart can be determined depending on the signal strength. Ultimately, the image is rendered by the display module.

#### **1.2.2** Properties of Ultrasound Signals

#### **Propagation Attenuation**

Ultrasonic waves propagating through tissues inside human body lose energy. Primarily, there are two physical phenomenon which result in this energy loss – absorption and scattering. Absorption is the process in which acoustic energy of the travelling ultrasound waves is transformed into heat energy when they are absorbed by the tissues. Scattering refers to the deviation of the acoustic waves from their main path when they impinge upon a surface. These scattered waves cannot be detected by the transducer. Between these two physical phenomenon, absorption leads to more attenuation than scattering [1.8].

A metric for loss of energy as the acoustic waves travel along the propagation path is attenuation coefficient. It is expressed in dB/cm/MHz. For instance, if the attenuation coefficient of a tissue  $(c_{att})$  is 1 dB/cm/MHz, and an ultrasound signal of frequency (f) 10MHz propagates inside a tissue till a depth  $(d_t)$  of 5cm, then its attenuation is given by:

Attenuation = 
$$c_{att} \times d_t \times f$$
 (1.2)  
 $\Rightarrow$  Attenuation =  $1 dB/cm/MHz \times 5cm \times 10MHz$   
 $\Rightarrow$  Attenuation =  $50dB$ 

#### **Dynamic Range of the Received Signal**

Noise is generated by the read-out electronics. In addition, the transducer also generates noise, and acoustic noise is also present. Eventually, the required dynamic range of the system is determined by the required imaging quality. Based on the assumption that the noise limit related to the transducers and acoustics is below that level (lower limit of the system dynamic range), the noise produced by the front-end receive electronics limits the dynamic range of the system. As a result, the overall system dynamic range gets reduced relative to the intrinsic dynamic range of the transducer. Therefore, emphasis is on minimizing the noise of the readout electronics by smart design, simultaneously ensuring that the design is power-efficient as well.

The relation between strength of the received signal and the axial depth is depicted in Fig. 1.6. The overall dynamic range of the received signal consists of two components – instantaneous dynamic range and dynamic range due to attenuation in propagation.



Fig. 1.6. Illustration of dynamic range of the signal in receive mode [1.3]

Instantaneous dynamic range is due to the difference in strength of received echoes from different kind of tissues at the same depth. These different tissues have different acoustic impedances. Therefore, the strengths of the acoustic waves which get reflected from these tissues vary. The instantaneous dynamic range has a constant value for a given frequency and penetration depth (40 dB in our case). With increase in depth of imaging, the strength of the received signal reduces, and ultimately it coincides with the noise of the read-out electronics (noise floor shown in Fig. 1.6) [1.3].

#### **1.3 Design Challenges**

There are certain challenges in order to develop a miniature TEE probe, comprising of a matrix piezoelectric transducer, for 3D imaging of the heart. In order to enhance the resolution of 3D images, two conditions need to be satisfied. Firstly, the pitch of an individual transducer element must be small and secondly, the total aperture should be large. Consequently, a matrix transducer comprising of several thousands of elements is needed. In this project, more than 2000 elements are required. However, the esophageal cavity imposes limitations on the size of the gastroscopic tube, that will be inserted into the esophagus of the patient during diagnosis. Therefore, it would not be possible to connect each element of the transducer to an external imaging system with a separate cable, considering the limited space available inside the tube. Besides, the tube should

also remain flexible for ease of navigation inside the esophagus of the patient during diagnosis. Therefore, a major challenge is to reduce the number of cables locally, while maintaining sufficient signal-to-noise ratio. This can be accomplished by using front-end receiver electronics bonded to the transducers that provide appropriate signal conditioning at the tip of the probe.

The primary emphasis of this thesis is on the optimization of the receiver electronics, at both the system and circuit level. There are two limitations for the receive-ASIC integrated at the tip of the TEE probe. The first one is with respect to power budget. The electronic circuitry integrated at the tip of the probe will consume power. Because of this power dissipation, the temperature at the site of diagnosis of the patient, which is close to the heart, will rise. This rise in temperature can lead to burning or scarring of the esophagus tissue, which is highly undesirable. Therefore, power consumption of the receiver electronics needs to be minimized. Besides, the temperature sensor in the TEE probe turns off when a certain temperature ( $42^{\circ}$ C to  $44^{\circ}$ C) is reached at the site of diagnosis [1.9]. This kind of interruption in diagnosis is certainly unacceptable. For commercial TEE probes, the power dissipation in transmit mode is around 1-2 W. In order to prevent overheating of the tissue at the site of diagnosis, the power budget of the receiver electronics is set at 1 W, which in turn translates to a power consumption limitation of less than 0.5 mW per transducer element. The second limitation for the receive ASIC is with respect to available space. The space at the probe tip has a size of  $2cm \times 1cm \times 1cm$  (length  $\times$  width  $\times$  height). Therefore, the circuit design needs to be compact [1.3].

The overall power consumption in the existing design [1.3] was almost 500  $\mu$ W per transducer element. In this thesis project, the main goal is to reduce it further by optimizing the design at the system and circuit-level. Besides, the design must be area-efficient.

#### **1.4 Objectives**

The primary objective of this thesis is to tackle the design challenges as described in the previous section (Section 1.3). There are two main challenges:

- 1. Optimizing the design to improve its power-efficiency at:
  - a. System-level
  - b. Circuit-level

#### 2. Area-efficient design

In order to improve the power efficiency at the system-level, the existing signal processing chain (as described in chapter 2) needs to be re-arranged. This is discussed in section 3.2.2. At the circuit-level, various design strategies have been used, which are presented in chapter 4.

Finally, in order to make the design compact, the sizing of the capacitors are done carefully (sections 3.3.2 and 4.4.2). In addition, re-organizing the blocks in the signal processing chain (as discussed in section 3.2.2) facilitates reduction in number of circuit blocks, thereby, leading to an area-efficient design.

#### **1.5 Organization of the Thesis**

The remainder of this thesis comprises of five chapters. In order to optimize the design of the front-end receiver electronics, a thorough understanding of the components in the signal processing chain is required. Chapter 2 provides this overview about the different components of the system. At the end of this chapter, the distribution of power consumption among the individual blocks of the existing design is given. In addition, an approximate calculation of the allocation of power budget to the individual blocks in our proposed design is performed. In chapter 3, the shortcomings of the existing micro-beamformer design are discussed first. Subsequently, the system-level design of a novel high-dynamic-range micro-beamformer based on active charge-mode summation is presented. Chapter 4 provides the transistor-level design of the individual blocks of the system. In chapter 5, simulation results obtained from different analysis like Transient, AC and Noise are discussed. Finally, in chapter 6, conclusions are drawn and the proposed low-power design is compared with the existing state-of-the-art design. Special section is dedicated to provide guidelines regarding layout of the proposed design.

#### References

[1.1] [Online] http://www.who.int/mediacentre/factsheets/fs317/en/ (2011)

[1.2] [Online] http://www.uchospitals.edu/online-library/content=P00208

[1.3] Zili Yu, *Low-Power Receive-Electronics for a Miniature 3D Ultrasound Probe*, PhD thesis, Delft University of Technology, 2012.

[1.4] Oldelft Ultrasound, "Oldelft MicroMulti TE Probe Type number 171Z-," User Manual, February 2010, [Online] Available at: http://www.oldelft.nl/upload/documenten/gebruiksaanwijzingen/250m065-00a-manualmicromulti.pdf

[1.5] E.A. Fisher, J.A. Stahl, J.H. Budd, and M.E. Goldman, "Transesophageal echocardiography: procedures and clinical application," *Journal of the American College of Cardiology*, vol. 18, issue 5, pp. 1333-1348, Nov. 1991

[1.6] http://ultrasoundbook.net/images/ultrasound\_book\_optimised-5.pdf

[1.7] A. Fenster, D. B. Downey, and H. N. Cardinal, "Three-dimensional ultrasound imaging," *Physics in Medicine and Biology*, vol. 46, pp. 67-99, 2001.

[1.8] S.C. Cobbold, Foundations of Biomedical Ultrasound, Oxford University Press, 2007.

[1.9] J.N. Hilberath, D.A. Oakes, S.K. Shernan, B.E. Bulwer, M.N.D'Ambra, and H.K. Eltzschig, "Safety of Transesophageal Echocardiography," [Online] Available at: http://www.asecho.org/files/JASE2010Nov.pdf

### **Chapter 2**

## **Overview of Existing Design**

In this chapter, a summary of the existing design is presented. An overview of the signal processing chain is first given in section 2.1. In the following sections, the design of the individual blocks – low-noise amplifiers (LNAs), time-gain-compensation (TGC) amplifiers and micro-beamformer is discussed. All these blocks were designed in On Semiconductor  $0.35\mu$ m CMOS process. Subsequently, power consumption of individual blocks of the existing design is given in section 2.5. In addition, the basic idea behind the proposed design is introduced along with the approximate power budget of the constituent blocks. Based on these power consumption values, a comparison is made between the existing and proposed designs. Finally, conclusions are drawn.

#### 2.1 Overview of Signal Processing Chain of Front-End Receiver Electronics



Fig. 2.1 Block diagram of a conventional receive-signal processing flow for ultrasound array transducers [2.1]

The receive signal processing chain (as shown in Fig. 2.1) for an ultrasound transducer array comprises of five modules: LNAs, TGC amplifiers, a receive (Rx) beamformer, an image processing module and a display module. Firstly, LNAs amplify the electrical signals from transducer elements (corresponding to the echoes) with a certain gain in order to increase the signal level above the noise level of the subsequent circuitry. The echo signals returning from deep tissues suffer greater attenuation compared to those from nearby tissues, and also they take more time to arrive at the probe tip. Therefore, TGC amplifiers amplify the signals with a gain that rises exponentially with time (linear in dB). This serves two purposes: (a) the dynamic range requirement of the subsequent blocks in the signal processing chain gets relaxed and (b) the image uniformity is maintained. Subsequently, the Rx beamformer provides delays to the signals with respect to each other in such a manner that waves from the point of focus reach at the same time and hence, can be summed coherently. Finally, this summed signal is processed by the image processing module and the image is rendered by the display module [2.1].



Fig. 2.2 Receive signal processing architecture for 3D TEE (N=9 and M=225) [2.1].

The receive signal processing flow of our design can be divided into a front-end and a back-end (as depicted in Fig. 2.2). The front-end signal processing is implemented as an ASIC at the tip of TEE probe, whereas the back-end processing takes place in an external imaging system.

The Rx transducer array consists of 2025 transducer elements (45 x 45). Because of the limited space available inside the gastroscopic tube, the channel-count must be reduced at the probe tip. Therefore, the beamforming action is required in the front-end signal processing flow. Sub-array beamforming architecture [2.2] is selected to perform this function. The beamforming activity is split into "pre-beamforming" or "micro-beamforming" that is implemented in the front-end ASIC at the probe tip, and "post-beamforming" which is carried out in the back-end (external imaging system). In micro-beamforming, the matrix transducer is divided into sub-groups, all of which share the same fine delay pattern. These delayed signals are then summed up, thereby resulting in a reduction of the channel count. Our design comprises 225 such sub-groups, each of which integrates 9 transducer elements (N=9) and is comprised of 9 LNAs, 9 TGC amplifiers, and a micro-beamformer. The output signals of all micro-beamformers propagate through micro-coaxial cables to the external imaging system, which is the site of post-beamforming. In post-beamforming, the signals from each sub-group are given a coarse delay, resulting in their alignment in time and final summation [2.1].

#### 2.2 Low Noise Amplifier (LNA)

The Rx transducer in our design is modeled as a voltage source with a source impedance of around 2.5 k $\Omega$  [2.1]. The noise voltage at the output of this source in the bandwidth of interest (4.5 MHz – 7.5 MHz) is about 15  $\mu V_{rms}$ . As a design choice, the input-referred noise of the LNA has been chosen to be lower than this value. In addition, it must have a low power consumption. Considering these two constraints, a simple open-loop single-ended topology is required for implementation of LNA. A common-source amplifier with a load resistance is a good candidate for such an implementation (as shown in Fig. 2.3).



Fig. 2.3 Common-source amplifier with a load resistor: (a) NMOS implementation, and (b) PMOS implementation [2.1]

The voltage gain is given by:

$$A_{OL} = g_{m,M1} \times R_L \tag{2.1}$$

where  $g_{m,M1}$  is the transconductance of the transistor M1. The input-referred noise power spectral density (PSD) is:

$$\overline{v_n^2} = \frac{8}{3}kT.\frac{1}{g_{m,M1}} + 4kT.\frac{1}{g_{m,M1}^2 \times R_L}$$
(2.2)

The noise power can be reduced by choosing a large value of  $g_{m,M1}$ . In this one-transistor LNA topology, the full biasing current determines the trans-conductance of M1. In addition, the noise PSD of the load resistance is attenuated by square of the open-loop voltage gain ( $A_{OL}$ ) of M1 at the LNA input. Finally, this LNA topology is highly compact, which results in an area-efficient implementation, as required for 3D TEE application.

In the LNA implementation, a P-type MOSFET is chosen as the transistor M1, which is biased in moderate inversion region. The reason behind this choice is the fact that a PMOS transistor is located in the N-well which minimizes substrate noise coupling [2.3].

The input-referred noise voltage of the LNA is  $11 \,\mu V_{rms}$ , which satisfies the noise constraint. The power consumption is  $107 \,\mu W$  (including the bias network) [2.1].

#### 2.3 Time Gain Compensation Amplifier

At the input of the TGC amplifier, the signal dynamic range consists of two parts: instantaneous dynamic range of 40 dB and dynamic range due to propagation attenuation of up to 40 dB. As discussed in sub-section 1.2.2 (and depicted in Fig. 1.6), ultrasound signal exhibits a linear (in dB) propagation attenuation. If the TGC amplifier provides a gain that increases exponentially with time (linearly in dB) in order to account for the rate of propagation attenuation, only the instantaneous dynamic range would remain after compensation (as shown in Fig. 2.4 (c)).



Fig. 2.4 Dynamic range of the ultrasound receiver system: (a) dynamic range at the input of the TGC amplifiers, (b) ideal TGC scheme, (c) output dynamic range after ideal TGC, (d) four-step TGC scheme, and (e) output dynamic range after four-step compensation [2.1].

In order to provide such fine discrete gain steps, a highly complex amplifier design is required which will lead to an increase in the power consumption. Therefore, a simplified topology is selected which produces four discrete gain settings to cover the 40 dB gain range. From Fig. 2.4(e), it can be seen that after compensation, the dynamic range requirement at the input of the micro-beamformer is relaxed [2.1].

In order to achieve programmable discrete gain settings, various circuit topologies have been proposed in literature [2.4] [2.5] [2.6]. In the existing design [2.1], a power-efficient amplifier topology [2.7] is implemented which employs local-feedback to achieve a high bandwidth. It comprises of a voltage-to-current (V/I) converter, a current-to-voltage (I/V) converter and a source-follower buffer. The simplified schematic of the TGC amplifier is shown in Fig. 2.5.



Fig. 2.5 Simplified schematic of the TGC amplifier [2.1].

A differential pair comprising of transistors  $M_{1A}$  and  $M_{1B}$ , in combination with a sourcedegeneration resistor  $R_S$  achieves the V/I conversion. In order to enhance the linearity of V/I conversion, a cascoded flipped voltage follower (CASFVF) topology [2.8] is employed. In addition, Kelvin connections are used to cancel the errors caused due to the on-resistance of the switches. The load resistors  $R_L$  provide the I/V conversion. Therefore, the overall voltage gain of the TGC amplifier is determined by the ratio of resistors (=  $2R_L/R_S$ ). The discrete gain settings are obtained by switching between different degeneration resistors  $R_{S1} - R_{S4}$ . In this design, the four different gain settings chosen are: 0 dB, 12 dB, 26 dB and 40 dB. Finally, the source followers –  $M_{2A}$  and  $M_{2B}$  buffer the output differential voltages to drive  $C_L = 250 fF$  loads. The power consumption of the TGC amplifier is 130 µW [2.1].

#### 2.4 Micro-Beamformer

In theory, it is possible to implement a micro-beamformer in either the analog or the digital domain. The main advantage of digital beamforming lies in its high accuracy. The primary concern is the requirement of an analog-to-digital converter (ADC) for each transducer element, as depicted in Fig. 2.6. Further, in the signal processing chain, memory elements and digital adders are also needed to reduce the channel count. The power consumption of commercial ultrasound front-end receive ICs is in the order of 100 mW per channel [2.9][2.10][2.11]. Even when a low-power design approach is applied, using digital beamforming results in a power dissipation of around 450 mW for 16 channels with a sampling frequency of 40 MHz [2.12]. It is

clearly not affordable for applications involving 3D TEE in which there are more than 2000 transducer elements, and the power consumption per channel must be limited to 0.5 mW per channel. Therefore, analog beamforming is implemented in our design [2.1].



Fig. 2.6 Implementation of Digital beamforming [2.1]

In the analog domain, different circuit design techniques can be used to implement delays, like cascade of all-pass filter cells, charge-coupled devices and bucket-brigade devices. However, there are certain problems associated with all of them [2.1]. In our design, the time-interleaved sample and hold (S/H) approach is used. It is also referred to as pipeline-operated sample and hold delay line.



Fig. 2.7. Pipeline-operated S/H delay line [2.1]

The operating mechanism of a pipeline-operated S/H delay line is shown in Fig. 2.7 for the simplified case of a single-ended topology. Two groups of clock signals: group A ( $\Phi_1$ - $\Phi_N$ ) and group B ( $\Phi_{1d}$ - $\Phi_{Nd}$ ) regulate the switches in a cyclic pattern. The value of N is set to 8 in this case. Width of the clock pulse is  $\Delta t$ . The equation relating the relative time delay (between the clock signals of group A and group B) and the pulse width is given by:

$$\tau = k \times \Delta t \tag{2.3}$$

where k is an integer that can vary from 1 to 7. In this case, k is 3.

N samples of the input voltage signal are sequentially sampled by clock signals of group A on capacitors  $C_1$  to  $C_N$ , respectively. These samples are then held for a brief time interval, before being read out by the clock signals of group B. This procedure is then repeated for the subsequent group of N samples. For proper working of this delay line, the following criteria need to be fulfilled:

1. The sampling frequency must comply with the Nyquist law, in order to reconstruct the input signal from its samples

- 2. The input signal to the delay line is driven with a certain output impedance (including that of the sampling switch) onto the sampling capacitor, which limits the bandwidth. Therefore, the size of the capacitor and the driving impedance should be chosen in such a way that this bandwidth is sufficiently high.
- 3. In order to ensure that the voltage that was sampled on a capacitor is read out before a new value of the input voltage signal is sampled on it, the following relation must be satisfied:

$$\tau_{max} = k_{max} \times \Delta t \tag{2.4}$$

where  $\tau_{max}$  is the maximum value of the relative time delay between the two groups of clock signals and  $k_{max}$  depends on the number of capacitors N in the delay line according to this equation:

$$k_{max} = N - 1 \tag{2.5}$$

The main advantages of a pipeline-operated S/H delay line which makes it compatible for 3D TEE application is its low power consumption, very precise timing (determined by digital clock signals) and high flexibility [2.1].

In the final phase of beamforming, the summation of delayed signals from the transducer elements can be performed in voltage domain [2.13], current domain [2.14] or charge domain [2.15]. The voltage domain summation requires an operational amplifier. For current mode summation, additional circuitry is required for V/I conversion, since the output signal of a conventional piezo-electric transducer employed in an ultrasound system is in voltage domain. The primary advantage lies in its simplicity as the outputs of the V/I converters can be tied together. The charge-mode summation technique (as shown in Fig. 2.8 with a single delay line) in conjunction with the pipeline operated S/H topology results in high power-efficiency and low complexity [2.1].



Fig. 2.8 A pipeline operated S/H delay line with charge mode summation [2.1]

For a realistic implementation of this structure, certain design norms are followed:

- 1. In order to minimize the effects of charge-injection and clock feed-through from switches, a differential configuration is used.
- 2. The sampling capacitors are sized carefully to reduce the kT/C noise.
- 3. In order to ensure matching of sampling capacitors and minimizing parasitic capacitance on the summation node, special attention is given during layout.
- 4. A global reset switch is used to remove the residual charge on the parasitic capacitance (at the output node) prior to signal summation, thereby minimizing errors in output voltage.
- 5. The clocks are made strictly non-overlapping in order to prevent charge sharing and to guarantee accuracy in the sampling of signals and their subsequent summation.

The power consumption of this micro-beamformer is 267  $\mu$ W per transducer element. This is the dynamic power consumption of the digital logic that generates the clocks required to drive the switches in the delay lines [2.1].

#### 2.5 Evaluation of Power Consumption

The power consumption of each block in the existing front-end receive signal processing chain [2.1] is given in Table 2.1. The total power consumption is around 0.5 mW per transducer element.

| Block      | Power contribution per transducer element |
|------------|-------------------------------------------|
|            | (in µW)                                   |
| LNA        | 107                                       |
| TGC        | 130                                       |
| Delay Line | 267                                       |
| Total      | 504                                       |

Table 2.1 Break-down of power consumption of individual blocks of existing design

It is possible to reduce the overall power dissipation by re-arranging the existing signal processing chain. For instance, moving the TGC functionality to the output of the microbeamformer and having 1 TGC amplifier for 9 transducer elements (as shown in Fig. 2.9). This would not alter the way in which the signal is processed, since for each sub-group (comprising of an LNA and a delay line), the signal received from every transducer element would still get amplified with the same TGC gain as before.


Fig. 2.9. Architecture of the proposed receive signal processing chain

The power-efficiency and area-efficiency of the receiver electronics will improve. In order to illustrate this point further, an approximate calculation of the proposed power budget is performed in Table 2.2, which is based on the following simplification:

• No additional circuitry will be required to deal with the increase in dynamic range at the input of the micro-beamformer, that will occur because of shifting the TGC functionality to the end of the signal processing chain.

The factor of 16 reduction in the power consumption of the delay line (depicted in Table 2.2) can be derived as follows. If we migrate to a new denser CMOS technology node (from existing  $0.35\mu m$  to  $0.18\mu m$ ), the dynamic power consumption of the delay lines will scale down according to this equation [2.16]:

$$P_{dyn} = \kappa \times f \times C_{Load} \times V_{DD}^{2}$$
(2.6)

where  $\kappa$  is the probability of a power consuming transition  $(0 \rightarrow 1)$  or switching activity factor, f is the clock frequency,  $C_{Load}$  is the load capacitance and  $V_{DD}$  is the supply voltage.

If we design the circuitry in the denser  $0.18\mu m$  CMOS process,  $V_{DD}$  will scale down by almost a factor of 2 (from 3.3 V to 1.8 V). Besides,  $C_{Load}$  will also scale down by a factor of 4, because of

the fact that the gate area and hence, the gate capacitance of a minimum-sized switch (driven by the clock signals in each delay line) will also reduce by a factor of 4. For instance, in the On Semiconductor  $0.35\mu m$  CMOS process, the aspect ratio of a minimum-sized switch is  $0.5 \mu m/$  0.35  $\mu m$ , whereas in TSMC 0.18 $\mu m$  CMOS process, it is  $0.22 \mu m/0.18\mu m$ . Because of these two factors (reduction in  $V_{DD}$  and  $C_{Load}$ ),  $P_{dyn}$  will reduce by a factor of 16 (approximately).

Based on Table 2.2, the overall reduction in power consumption  $(r_{PC})$  becomes:

$$r_{PC} = \frac{504 - 138.1}{504} \times 100 = 72.6 \%$$

It can be seen that almost 73 % reduction in power consumption can be achieved using this new architecture. In addition, an area corresponding to that of 8 TGC amplifiers can be saved, which results in a compact design.

| Block      | Power contribution per transducer element |
|------------|-------------------------------------------|
|            | (in µW)                                   |
| LNA        | 107                                       |
| TGC        | 130/9 = 14.4                              |
| Delay Line | 267/16 = 16.7                             |
| Total      | 138.1                                     |

Table 2.2 Break-down of power consumption of individual blocks of proposed design

# 2.6 Conclusions

In this chapter, the design choices, operating principles and transistor-level implementations of the individual blocks of the existing receive signal processing chain [2.1] are discussed. A closer look at the existing signal processing chain reveals that it is possible to improve the areaefficiency and power-efficiency (especially the latter significantly) by re-arranging the blocks. An approximate analysis of the allocation of power budget to the individual blocks validates this point. In the next chapter, this topic of optimization of power consumption at the system level will be explored in detail.

# References

[2.1] Zili Yu, *Low-Power Receive-Electronics for a Miniature 3D Ultrasound Probe*, PhD thesis, Delft University of Technology, 2012.

[2.2] B. Savord and R. Solomon, "Fully sampled matrix transducer for real time 3D ultrasonic imaging," *Proceedings IEEE Ultrasonics Symposium*, 2003, pp. 945-953.

[2.3] A. Helmy and M. Ismail, Substrate Noise Coupling in RFICs, New York: Springer, 2008.

[2.4] E. Brunner, "An ultra-low noise linear-in-dB variable gain amplifier for medical ultrasound applications," Conference Record. *Microelectronics Communications Technology Producing Quality Products Mobile and Portable Power Emerging Technologies*, pp.650-655, 1995.

[2.5] D. Ma, C. Zhang, H. Chao, and M. Koen, "Integrated low-power CMBF free variable-gain amplifier for ultrasound diagnostic applications," *Analog Integrated Circuits and Signal Processing*, vol. 61, pp. 171-179, 2009.

[2.6] M. Sawan, R. Chebli, and K. Kassem, "Integrated front-end receiver for a portable ultrasonic system," *Analog Integrated Circuits and Signal Processing*, vol. 36, pp. 56-67, 2003.

[2.7] J. Yao, Z. Yu, M. A. P. Pertijs, G. C. M. Meijer, C. T. Lancée, J. G. Bosch, and N. de Jong, "Design of a low power time-gain-compensation amplifier for a 2D piezoelectric ultrasound transducer," *Proceedings IEEE Ultrasonics Symposium*, 2010, pp. 841-844.

[2.8] J. Ramirez-Angulo, S. G. I. Padilla, R. G. Carvajal, A. Torralba, M. Jiménez, and F. Muňoz, "Comparison of conventional and new flipped voltage structure with increased input/output swing and current sourcing/sinking capabilities," *IEEE International Midwest Symposium on Circuits and Systems*, 2005, pp. 1276-1291.

[2.9] "AD9278 Low-power, octal receiver with CW I/Q demodulator for portable ultrasound," Analog Devices Datasheet, [Online] Available at: http://www.analog.com/static/imported-files/data\_sheets/AD9278.pdf (January 2012)

[2.10] "LM96511 Ultrasound receive analog front end," National Semiconductor Datasheet, [Online] Available at: http://www.national.com/pf/LM/LM96511.html#Overview (January 2012)

[2.11] "AFE5808 Fully Integrated 8-Channel Ultrasound Analog Front End with Passive CW Mixer for Ultrasound," Texas Instruments Datasheet, [Online] Available at: http://www.ti.com/product/afe5808 (January 2012)

[2.12] V. S. Gierenz, R. Schwann, and T. G. Noll, "A low power digital beamformer for handheld ultrasound systems," *Proceedings of the 27th European Solid-State Circuits* 

Conference, Villach, Austria, 2001, pp. 261-264.

[2.13] T. K. Song and J. F. Greenleaf, "Novel ultrasonic dynamic focusing system with reduced hardware complexity," *Proceeding SPIE*, vol. 1733, pp. 141-153, 1992.

[2.14] B. Stefanelli, I. O'Connor, L. Quiquerez, A. Kaiser, and D. Billet, "An analog beamforming circuit for ultrasound imaging using switched-current delay lines," *IEEE Journal of Solid-State Circuits*, vol. 35, no. 2, pp. 202- 211, 2000.

[2.15] Z. Yu, M. A. P. Pertijs, and G. C. M. Meijer, "Ultrasound beamformer using pipelineoperated S/H delay stages and charge-mode summation," *Electronics Letters*, vol. 47, issue. 18, pp. 1011-1012, 2011.

[2.16] A. P. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, and R. W. Brodersen, "Optimizing power using transformations," *IEEE Trans. Computer-Aided Design*, vol. 14, pp. 12–51, Jan. 1995.

# **Chapter 3**

# **Architecture-Level Design**

In this chapter, a system-level design of our proposed work is presented. In order to come up with a new design, the shortcomings of the existing design need to be investigated. The present design of the micro-beamformer [3.1] has certain limitations. These limitations are discussed in section 3.1. Subsequently, the architecture-level design of a new high-dynamic-range micro-beamformer is presented in section 3.2. All the features of this new design are discussed and the factor limiting its dynamic range is also investigated. Once the underlying reason is found, a solution is proposed in the form of an Offset Calibration Loop (OCL) in section 3.3. The operation of the OCL is analyzed in details. Noise analysis is done in order to arrive at an optimum power budget of the main  $G_m$ -stage and ascertain the on-resistance of the switches and the sampling capacitors in the delay line. Finally, conclusions are drawn in section 3.4.

## 3.1 Limitations of Existing Micro-Beamformer

The micro-beamformer used in the existing design [3.1] has certain shortcomings. In order to improve the power efficiency and area efficiency of the design, these shortcomings need to be investigated. The limitations are in the form of limited dynamic range, limited capability to drive loads and sensitivity to parasitic capacitances. All these limitations are discussed in the following sub-sections.

#### 3.1.1 Precision Considerations in Existing Differential Implementation

In the pipeline-operated sample and hold (S/H) delay line shown in Fig. 2.8, errors are caused by non-idealities of the switches. These error sources are charge injection and clock feed-through [3.2]. Using a differential topology, it is possible to circumvent these errors to first order. What remains after first order compensation of these errors is the residual charge injection error and the residual clock feed-through error [3.1].

#### **Residual Charge Injection Error**

The charge injection error in a single-ended S/H circuit is depicted in Fig. 3.1(a).



Fig. 3.1. Charge injection error in (a) single-ended S/H circuit, and (b) differential S/H circuit

The expression for the total charge formed in the channel of a switch (when it is ON) is given by [3.2]

$$Q_{ch} = W_1 L_1 C_{ox} \left( V_{DD} - V_{in} - V_{TH0} - \Upsilon \sqrt{2\Phi_B + V_{in}} + \Upsilon \sqrt{2\Phi_B} \right)$$
(3.1)

where  $W_1$  and  $L_1$  are the width and length of the MOS transistor  $M_1$ , respectively,  $V_{TH0}$  is the zero-bias threshold voltage,  $\Upsilon$  is the bulk effect factor,  $\Phi_B$  is the Fermi potential and  $V_{DD}$  is the supply voltage.

Considering worst case condition, in which all the channel charge of switch  $M_1$  is injected onto the sampling capacitor  $C_H$ , when it turns OFF, the error in the output voltage is given by

$$\Delta V_{ch} = \frac{Q_{ch}}{C_H} = \frac{W_1 L_1 C_{ox} (V_{DD} - V_{in} - V_{TH0} - \Upsilon \sqrt{2\Phi_B + V_{in} + \Upsilon \sqrt{2\Phi_B})}{C_H}$$
(3.2)

In the On Semiconductor 0.35 µm CMOS process, the values of the transistor model parameters are:  $V_{TH0} = 0.6 V$ ,  $\gamma = 0.55 V^{0.5}$ ,  $C_{ox} = 4.9e - 3 F/m^2$  and  $\Phi_B = 0.43 V$ . A minimum-sized NMOS switch is used such that  $W_1 = 0.5 µm$  and  $L_1 = 0.35 µm$ . Besides,  $C_H = 250 fF$ ,  $V_{DD} = 3.3 V$  and  $V_{in} = 0.8 V$  (minimum level of  $V_{in}$  was assumed). Using these values in the above equation, the error voltage comes out to be 6mV, which is too high [3.1].

For the differential S/H circuit shown in Fig. 3.1(b), the residual error voltage can be calculated as

$$\Delta V_{residual} = \Delta V_1 - \Delta V_2$$

If we assume that widths and lengths of transistors  $M_2$  and  $M_3$  are matched, so that  $W_2L_2 = W_3L_3 = WL$ , the above equation becomes

$$\Delta V_{residual} = \frac{WLC_{ox}(V_{DD} - V_{in1} - V_{TH0} - \Upsilon\sqrt{2\Phi_B + V_{in1}} + \Upsilon\sqrt{2\Phi_B})}{C_H} - \frac{WLC_{ox}(V_{DD} - V_{in2} - (V_{TH0} + \Delta V_{TH}) - \Upsilon\sqrt{2\Phi_B + V_{in2}} + \Upsilon\sqrt{2\Phi_B})}{C_H + \Delta C_H}$$
(3.3)

where  $\Delta V_{TH}$  is mismatch in zero-body-bias threshold voltage of NMOS switch  $M_3$  relative to  $M_2$ and  $\Delta C_H$  is mismatch of sampling capacitor in the branch containing  $M_3$  relative to the sampling capacitor in the branch containing  $M_2$ .

The right hand side of the above equation can be arranged into four error terms as follows:

- 1. Mismatch in overdrive voltage of the switches
- 2. Mismatch in zero-body-bias threshold voltages,  $\Delta V_{TH}$
- 3. Non-linearity due to body effect
- 4. Mismatch in sampling capacitors,  $\Delta C_H$

The two dominant contributors of residual charge injection error among these four error terms are the second error term ( $\Delta V_{TH}$ ) and the third term (non-linearity due to body effect). Based on the calculations, the error due to residual charge injection comes out to be 1 - 2mV [3.1].

#### **Residual Clock Feed-through Error**

Clock feed-through error is caused by coupling of the transitions in the clock signals via the overlap capacitances (gate-drain or gate-source) of a MOS transistor to the sampling capacitor.



Fig. 3.2. Clock feed-through error in (a) single-ended S/H circuit, and (b) differential S/H circuit

For a single-ended S/H circuit shown in Fig. 3.2(a), the error in output voltage is

$$\Delta V_{ch} = V_{DD} \times \frac{C_{ov}}{C_{ov} + C_H}$$
(3.4)

where  $C_{ov}$  is the overlap capacitance.

For the differential case in Fig. 3.2(b), the residual error in output voltage is

$$\Delta V_{residual} = V_{DD} \left( \frac{C_{ov}}{C_{ov} + C_H} - \frac{C_{ov}}{C_{ov} + C_H + \Delta C_H} \right)$$
(3.5)

This is the case when the overlap capacitors are considered constant and mismatch in sampling capacitors,  $\Delta C_H$  is taken into account.  $C_{ov}$  can be calculated using this equation:

$$C_{ov} = C_{gdo} \times W$$

where  $C_{gdo}$  is the gate-drain overlap capacitance per unit width (in F/m) and W is the width of the MOS transistor. In the chosen process,  $C_{gdo} = 1.68e - 10 F/m$ . For  $W = 0.5 \mu m$ ,  $C_{ov} =$  8.4*e* - 17 *F* (using the above equation). If  $C_H = 250 \ fF$ ,  $\Delta C_H/C_H = \pm 0.3\%$  and  $V_{DD} = 3.3 \ V$ , then for a single-ended circuit shown in Fig. 3.2(a), the error voltage,  $\Delta V_{ch} \approx 1 \ mV$ . For the differential circuit in Fig. 3.2(b), the error voltage,  $\Delta V_{residual} = \pm 3.3 \ \mu V$ .

When the sampling capacitors are kept constant ( $C_H$ ) and mismatch in overlap capacitors,  $\Delta C_{ov}$  is considered, the error voltage becomes

$$\Delta V_{residual} = V_{DD} \left( \frac{C_{ov}}{C_{ov} + C_H} - \frac{C_{ov} + \Delta C_{ov}}{C_{ov} + \Delta C_{ov} + C_H} \right)$$
(3.6)

In this case, the error voltage comes out to be  $\pm 60 \ \mu V$  [3.1]

Besides residual charge injection and residual clock feed-through, kT/C noise also contributes to a noise voltage of 200  $\mu V_{rms}$ . Among all the error sources, residual charge injection was found to be the dominant source of error. Since the error voltage due to residual charge injection is 1 - 2 mV and the maximum signal level at the input of the micro-beamformer is  $1 V_{pp}$ , the dynamic range is limited to 60 dB.

#### **3.1.2 Limited Output Drive Capability**

The drive capability of a circuit is primarily determined by its output stage [3.3]. The output stage in the front-end receive signal processing chain (as shown in Fig. 3.3) is the delay line comprising of passive elements (capacitors and switches). Therefore, the overall drive capability of the system is limited.



Fig. 3.3. Front-end receive signal processing chain

In order to enhance the drive strength, a buffer or an active-Gm stage is required after the microbeamformer in the signal processing chain (as shown in Fig. 3.4). However, adding either of these stages in each channel will further add to the power consumption per channel, which is undesirable. In addition, the area efficiency of the system will degrade.



Fig. 3.4. Addition of a buffer stage after delay line in each branch in the signal processing chain

#### 3.1.3 Sensitivity to Parasitic Capacitances

The delay line (for 1 channel) with all its parasitic capacitances at various nodes is shown in Fig. 3.5. The output stage of the TGC amplifier (preceding the delay line) is a buffer. Therefore, it behaves like a voltage source when it is used to drive the next stage, i.e., the delay line. Since the stray capacitance  $C_{px}$  is driven from a voltage source, it has a minimal effect on the circuit [3.4]. At the charge summation node, a reset switch is used to circumvent the errors caused due to residual charge on the parasitic capacitance  $C_{py}$  [3.1]. The parasitic capacitances  $C_{p1a}$  to  $C_{p8a}$  ( $C_{p1a}, C_{p2a}, \dots, C_{p8a}$ ) and  $C_{p1b}$  to  $C_{p8b}$  ( $C_{p1b}, C_{p2b}, \dots, C_{p8b}$ ) affect the accuracy of charge transfer from input to output of the delay line. Depending on the extent of mismatch between the sampling capacitors ( $C_{s1}$  to  $C_{s8}$ ), these parasitic capacitances also exhibit variation. Therefore, the effective sampling capacitance deviates from its ideal value for each delay setting, thereby leading to inaccuracy in charge transfer. The effect of these parasitic capacitances can be minimized by careful layout techniques in order to ensure capacitance matching.



Fig. 3.5. Schematic of the delay line showing all the stray capacitances

# **3.2 Micro-Beamformer Based on Active Charge Mode Summation**

In our proposed design, a novel high-dynamic-range micro-beamformer based on active charge mode summation is implemented. The functionalities of the TGC amplifier and delay line have been merged together into a switched-capacitor charge-amplifier topology. A stray-insensitive delay line is implemented using bottom-plate sampling approach. In our design, each delay line has 1 delay setting (2 parallel branches) in contrast to 7 delay settings (8 parallel branches) in the previous work [3.1]. This has been done in order to simplify the design and provide a proof of concept for our proposed idea of making the design power-efficient by moving the TGC amplifier to the end of the signal processing chain. All these features of the new design are discussed in the upcoming sub-sections. Finally, the factor limiting the dynamic range in the proposed single-ended implementation of the delay line is investigated.

#### 3.2.1 Stray-Insensitive Delay Line Based on Bottom-Plate Sampling

The schematic of the stray-insensitive delay line is depicted in Fig. 3.6(a). The corresponding clock signals driving the switches are shown in Fig. 3.6(b). For simplicity, only 1 delay line is shown.



Fig. 3.6. Stray-insensitive delay line (a) Schematic showing all the stray capacitors and (b) Clock signals

The parasitic capacitor  $C_{px}$  is driven from a voltage source. Hence, it can be ignored.  $C_{py}$  is at the virtual ground node of OTA. Therefore, its effect on the circuit gets divided by the open-loop gain of the OTA, making it negligible [3.5].

When the clock signals  $clk1\_in$  and  $clk\_in\_del$  are active (high), the sampling switches  $S_1$  and  $S_2$  are ON. In this phase, the input signal is sampled on the capacitor  $C_{s1}$  and the integration capacitor  $C_l$  is reset. The stray capacitance  $C_{p1b}$  is shorted by the on-resistance of  $S_2$ .  $C_{p1a}$  is connected to the input voltage source by means of  $S_1$ . Therefore, the effect of the parasitic capacitances  $C_{p1a}$  and  $C_{p1b}$  is negligible during this phase. At the end of this phase, clock  $clk1\_in$  goes from high to low first, thereby turning  $S_2$  off. At this moment, the voltage on the bottom plate of  $C_{s1}$  is zero. If the switch  $S_2$  is implemented with an NMOS transistor, its drain and source will be at a constant potential. Therefore, the charge injection caused by  $S_2$ , when it turns off, is signal independent. There exists a time interval of  $t_{bps}$  between the falling edges of  $clk1\_in\_del$  goes from high to low, which turns  $S_1$  off. Though the charge injected by  $S_1$  onto  $C_{s1}$  is signal dependent, the voltage across it will not change, since its bottom plate is in floating condition once  $S_2$  is off [3.6]. Therefore, during this phase there will be a fixed charge injection error and a fixed clock feed-through error.

During the read-out phase, the clock signals  $clk1\_out$  and  $clk1\_out\_del$  are active (high), thereby turning switches  $S_3$  and  $S_4$  on. The parasitic capacitance  $C_{p1a}$  gets shorted by the on-resistance of  $S_3$  and  $C_{p1b}$  is connected to virtual ground node by means of  $S_4$ .

Therefore, the effect of these stray capacitances can be neglected during this phase also. Similarly, during the sampling and read-out of capacitor  $C_{s2}$  in the second branch, these stray capacitances are either shorted or connected to virtual ground node.

#### Relation Between *clk\_rst* and *clk1\_out* (or *clk2\_out*)

There are two important aspects of the relationship between the clocks *clk\_rst* and *clk1\_out* (or *clk2\_out*). They are:

- 1. Overlapping of *clk\_rst* and *clk1\_out* (or *clk2\_out*)
- 2. Delay in Rising Edges of *clk\_rst* and *clk1\_out* (or *clk2\_out*)

#### 1. Overlapping of clk\_rst and clk1\_out (or clk2\_out)

From Fig. 3.6(b), it can be seen that the read-out clocks  $clk1\_out$  and  $clk2\_out$  driving the switches connected to virtual ground node ( $S_4$  and  $S_8$ , respectively) are made overlapping with the reset clock  $clk\_rst$ . At the rising edge of  $clk1\_out$  (or  $clk2\_out$ ), a spike occurs at the virtual ground node due to clock feed-through. Due to the fact that  $clk\_rst$  is active (high), this does not result in an error in the voltage at the output node of the OTA because it is in unity gain mode. Therefore, the spike that occurs at virtual ground node decays very fast and doesn't result in a permanent change in the output voltage of the OTA.

#### 2. Delay in Rising Edges of clk\_rst and clk1\_out (or clk2\_out)

There exists a time delay between the rising edges of  $clk\_rst$  and  $clk1\_out$  (or  $clk2\_out$ ), which is denoted by  $t_{cft}$  in the figure. This delay is needed to ensure that the voltage at the virtual ground node recovers from initial transients, by the time  $clk1\_out$  (or  $clk2\_out$ ) is applied. These transients occur at the virtual ground node because of two factors:

- 1. Charge injection caused due to simultaneous opening of read-out switches  $S_7$  and  $S_8$  at the falling edge of the read-out clocks *clk2\_out* and *clk2\_out\_del*.
- 2. Clock feed-through at the rising edge of *clk\_rst*.

#### 3.2.2 Signal Summation Using Charge Amplifier



input Bolay Ellio

Fig. 3.7. Schematic depicting active charge mode summation at virtual ground of OTA

The feedback network of the OTA consists of the integration capacitor,  $C_I$  and the total sampling capacitance,  $9C_S$ . The OTA along with its feedback network acts like a charge amplifier (CA). The gain from input of delay line  $(v_{in})$  to output of main OTA  $(v_{out})$  is given by

$$Gain_{CA} = \frac{9C_S}{C_I} \tag{3.7}$$

The summation of output signals of the 9 delay lines (in charge domain) occurs at the virtual ground node (indicated by 'vg' in Fig. 3.7) of the main OTA during the read-out phase. Since the charge-merge from the outputs of the delay lines takes place at the virtual ground node of an active- $G_m$  stage (main OTA), it is referred to as active charge mode summation.

#### 3.2.3 Possibility of Converting Charge Amplifier into TGC Amplifier

The TGC functionality can be incorporated in the charge amplifier. In the previous work [3.1], the gain of the TGC amplifier was determined by the ratio of resistors. In our proposed charge amplifier implementation, different gain settings can be achieved by using control signals

 $(en_{I_1}, en_{C_{I_2}} \text{ and } en_{C_{I_3}})$  to enable different integration capacitors -  $C_{I_1}, C_{I_2}$  or  $C_{I_3}$  (or their combinations) in the feedback network.



Fig. 3.8. Schematic illustrating how the charge amplifier can be used as a TGC amplifier

This is illustrated in Fig. 3.8. For instance, if  $en_{L_{11}}$  control signal is enabled, the gain setting becomes

$$Gain_{TGC} = \frac{9C_S}{C_{I1}} \tag{3.8}$$

Therefore, in the present design, the TGC functionality is shifted to the output of the microbeamformer in the signal processing chain. Subsequently, the micro-beamformer and TGC amplifiers have been merged into a charge amplifier topology. This improves the area-efficiency and power-efficiency of the design (to be illustrated in section 5.3).

In our design, the charge amplifier is implemented with a fixed gain setting. It is possible to incorporate TGC functionality in the main-frame in the back-end of the receive signal processing chain.

## 3.2.4 Dynamic Range Limitations in the Proposed Single-Ended Implementation

In our proposed single-ended implementation of the delay line, the dynamic range at the input of the delay line is limited by the mismatch in switches between the two out-of-phase branches, as discussed in sub-section 3.1.1. In one phase, the total offset at the output of the main OTA due to

non-idealities in the delay line (like charge injection and clock feed-through) is different from that in the other phase. This difference in offset in the two phases is caused by mismatch in sizes of switches in the two out-of-phase branches. The underlying assumption is that mismatch in sizes of switches and capacitors in the in-phase branches of the delay lines is small enough. Consequently, a periodic ripple pattern appears at the output of the main OTA.

If no signal (DC) is applied at the input of the micro-beamformer, a ripple pattern can be observed at the output of the main OTA, as shown in Fig. 3.9. In this figure, a peak-to-peak ripple value of 3 mV is observed, due to mismatch in the micro-beamformer. There is 20% mismatch in size of switches in the out-of-phase parallel branches of each delay line. Since the maximum signal level at the output of main OTA is  $1.4 V_{pp}$ , the dynamic range is limited to almost 54 dB.



Fig. 3.9. Waveforms showing the clock signals and the output of main OTA. At the input of the micro-beamformer a DC signal is applied.

#### **3.3 Enhancement of Dynamic Range Using an Offset Calibration Loop**

The requirement of an Offset Calibration Loop (OCL) is illustrated in Flowchart 3.1. The primary objective of our research is to reduce the power consumption of the front-end receive electronics, as mentioned before in section 1.4. In order to achieve that goal, our proposed strategy is to shift the TGC functionality at the end of the signal processing chain after the microbeamformer. Therefore, instead of having 9 TGC amplifiers for 9 channels, there will be only 1 TGC amplifier for 9 channels after the microbeamformer. This will improve the power efficiency of the design. An approximate calculation was found to confirm this hypothesis in section 2.5. However, the downside of moving TGC functionality after the microbeamformer is the increase in dynamic range at the input of microbeamformer to 80 dB (from 60 dB). In order to address this problem, a circuit level solution is required, which addresses the factor limiting the dynamic range (as explained in section 3.2.4) and then tries to circumvent it. This solution is in the form of an OCL as depicted in Fig. 3.10.





The working of the OCL is depicted in Flowchart 3.2. The OCL is a feedback loop which senses the ripple pattern at the output of the main OTA, and subsequently, stores the offset values corresponding to two phases in memory elements (Capacitors,  $C_m$ ). In the next step, a current is injected at the output of the main OTA, based on the offset values stored on the memory capacitors. This current creates an effective overdrive voltage at the input of the OTA. This overdrive voltage appears at the output of the OTA (after being amplified by the gain of the charge amplifier) in such a way that it neutralizes the ripple pattern.



Flowchart 3.2. Working principle of the Offset Calibration Loop





The OCL (shown in Fig. 3.10) consists of a 2-phase sample-and-hold block, 2 OTAs (OTA1\_OCL and OTA2\_OCL), and another block which is referred to as the delay line in OCL (because of its structural analogy to the delay line in the existing design in [3.1]). The operation of the OCL can be explained in two phases – a Calibration Phase and a Normal Operation Phase. Each of these phases consists of two sub-phases – Reset and Read-out, both of which are repeated iteratively several times. The two sub-phases are associated with the two phases of a non-overlapping clock.

#### **Calibration Phase**

After the Transmit (Tx) beam is sent, there exists a latency period before meaningful data can be received from the Receive (Rx) beam. This latency period can be utilized for calibrating the OCL. This latency period is called calibration window. Calculations have shown that the calibration window is 10  $\mu$ s. During this phase, the input signal is at DC level. The calibration phase consists of several clock cycles, which are required for the settling of the calibration loop to a final value of the ripple voltage at the output of main OTA.

#### Sub-phase 1: Reset

The conditions of the overall circuit are shown in Fig. 3.11. In this sub-phase, the following events occur:

- 1. The input signal is sampled on capacitor  $C_{s1}$  (per channel), as the sampling switches driven by clock signals *clk1\_in* and *clk1\_in\_del* are ON in the input delay line.
- 2. The read-out switch connected to the virtual ground of OTA1 and driven by *clk2\_out* signal is ON.
- 3. The integration capacitor  $C_I$  is reset.
- 4. The OCL is connected at the output of the main OTA (OTA1), but neither the 2-phase Sample & Hold nor the Delay line in OCL is active during this sub-phase.

### Sub-phase 2: Read-out

The circuit conditions during this sub-phase are illustrated in Fig. 3.12. These are the events during this sub-phase:

- The sampling process continues on capacitor C<sub>s1</sub>, as the clock signals *clk1\_in* and *clk1\_in\_del* are still active (high) in the input delay line.
- 2. Both the read-out switches driven by clocks  $clk2\_out$  and  $clk2\_out\_del$  are turned ON, and the charge stored on capacitor  $C_{s2}$  is transferred onto  $C_{l}$ .
- 3. The voltage at the output node of main OTA is sampled onto one of the capacitors C<sub>sh\_ocl3</sub> of the 2-phase sample & hold in the OCL. Simultaneously, voltage stored on capacitor C<sub>sh\_ocl4</sub> is read-out, as clk2\_in\_sh\_ocl is active (high) during this sub-phase.
- 4. The voltage read-out from  $C_{sh_ocl4}$  is converted into an equivalent current by OTA1\_OCL. This current charges memory capacitor  $C_{m2}$ , as both the sampling clock  $(clk2\_in\_dl\_ocl)$  and read-out clock  $(clk2\_out\_dl\_ocl)$  are active (high) in the delay line in OCL.

The voltage on the memory capacitors in the delay line in OCL settles to the correct value, because of negative feedback in the OCL. The negative feedback action of the OCL works properly in spite of the presence of the 2-phase sample & hold block in the loop. This is due to the fact that the unity gain bandwidth of the loop is much less than the frequency of operation of the 2-phase sample & hold block (as discussed in subsection 3.3.2 and appendix B).

5. The voltage sampled on  $C_{m2}$  is converted into a current by OTA2\_OCL  $(i_{inj})$ , which is injected at the output node of OTA1. This compensating current creates a small overdrive voltage  $(v_{ov})$  at the input of OTA1. If the trans-conductance of OTA1 is  $g_m$ , then  $v_{ov}$  is given by

$$v_{ov} = \frac{i_{inj}}{g_m} \tag{3.9}$$

6.  $v_{ov}$  gets amplified by the open-loop gain of OTA1 and appears at its output such that it can cancel out the offset due to mismatch in switches of the branch containing  $C_{s2}$  (relative to the switches in the branch containing  $C_{s1}$ ) in the input delay line.

#### **Normal Operation Phase**

During this phase, the input signal is the electrical signal obtained from the conversion of the received echo signals (from heart) by the piezo-electric transducer.

## Sub-phase 1: Reset

The diagram for this sub-phase is shown in Fig. 3.13. The events occurring during this sub-phase are similar to the Reset sub-phase during Calibration Phase, with one exception. In this sub-phase, the calibration loop is no longer connected at the output of OTA1.

## Sub-phase 2: Read-out

The events that occur during this phase are illustrated in Fig. 3.14.

- 1. The sampling process continues on capacitor  $C_{s1}$ , as the clock signals *clk1\_in* and *clk1\_in\_del* are still active, in the input delay line.
- 2. Both the read-out switches driven by clocks  $clk2\_out$  and  $clk2\_out\_del$  are ON, and the charge stored on capacitor  $C_{s2}$  is transferred onto  $C_{I}$ .
- 3. The 2-phase sample & hold is no longer active in the OCL.
- 4. The voltage sampled onto the memory capacitor  $C_{m2}$  during the Calibration Phase acts like a constant voltage source. It is converted into a compensating current by OTA2\_OCL. This current is injected at the output node of OTA1.



Fig. 3.11. Operation of the Offset Calibration Loop: Phase 1. Calibration, Sub-phase 1: Reset











# 3.3.1 Clocking Scheme

The waveforms of all the clock signals (including that of the OCL) are shown in Fig. 3.15. The calibration window is shown as 400 ns, for the sake of simplicity. All the clock signals in the OCL, except two, are inactive (low) during Normal Operation Phase. These two clock signals are the read-out clocks of the delay line in OCL, namely, *clk1\_out\_dl\_ocl* and *clk2\_out\_dl\_ocl*. They are essential to ensure that the ripple pattern at the output node of main OTA is cancelled out during Normal Operation Phase, as explained in the previous section.



Fig. 3.15. Waveforms of the clock signals. The calibration window is 400 ns.

#### 3.3.2 Noise Analysis



Fig. 3.16. (a) Simplified schematic of the switched-capacitor charge amplifier, and (b) Clock Signals

A simplified diagram of a switched-capacitor charge amplifier is shown in Fig. 3.16 (a). The clock signals are given in Fig. 3.16(b). Only 1 branch of the input delay line for 9 channels is considered for a simplified analysis. Therefore, the capacitance  $C_S$  shown in Fig. 3.16 (a) is the total capacitance for 9 channels. The noise contributed by the switches  $S_1$  through  $S_4$  and the main OTA is considered for analysis. The dominant source of noise introduced by the switches is thermal noise. The effect of the 1/f noise can be ignored in this case, because of the fact that the

flow of current in these switches is intermittent and occurs at a frequency equal to the clock frequency [3.7]. The noise analysis is done in two phases – Sampling phase and Read-out phase.

#### **Sampling Phase**

During the sampling phase, the switches  $S_1$  and  $S_2$  are ON, since the clocks *clk1\_in* and *clk1\_in\_del* are both active (high). The circuit comprising of  $C_s$  can be depicted by the branch as shown in Fig. 3.17(a). The switches  $S_1$  and  $S_2$  are substituted with their noise voltages and on-resistances. These noise voltages and resistors can be combined in pairs as depicted in Fig. 3.17(b). The total switch resistance per channel during the sampling phase becomes  $R_{on\_samp}$ . Also, since  $v_{n1\_samp}$  and  $v_{n2\_samp}$  are uncorrelated, the power spectral density (PSD) of the noise voltage  $v_n$  is given by

$$S_{\nu_n} = 4kTR_{on\_samp} \tag{3.10}$$



Fig. 3.17. Schematic of the charge amplifier during sampling phase. The noise sources are shown.

The relation between the PSD of the noise voltage  $v_{cs}$  across  $C_s$  and  $S_{v_n}$  is given by

$$S_{cs}(f) = \frac{S_{\nu_n}}{1 + (2\pi f \tau_0)^2} = \frac{4kTR_{on\_samp}}{1 + (2\pi f \tau_0)^2}$$
(3.11)

where  $\tau_0$  is the time constant of the branch containing  $C_S$  during the sampling phase, such that  $\tau_0 = R_{on\_samp} \times C_S$ 

The total power of  $v_{cs}(t)$  can be found by integrating  $S_{cs}(f)$  in the signal band of interest (4.5MHz - 7.5MHz). This yields

$$\overline{v_{cs}}^2 = \frac{4kTR_{on\_samp}}{4\tau_0} \times \frac{1}{OSR} = \frac{kT}{C_S} \times \frac{1}{OSR}$$
(3.12)

where OSR is the oversampling ratio,  $f_S$  is the sampling frequency and  $f_B$  is the signal bandwidth, such that  $OSR = f_S/(2f_B)$ ,  $f_S = 25MHz$  and  $f_B = 3MHz$ 

#### **Read-out Phase**

When both the read-out clocks  $- clk1_out$  and  $clk1_out_del$  are high (active), the resulting circuit is depicted in Fig. 3.18(a). A single-stage OTA is considered for analysis. Replacing the OTA with its equivalent small-signal model and by merging the on-resistances and noise voltages of the two switches  $-S_3$  and  $S_4$ , the equivalent circuit is shown in Fig. 3.18(b).



Fig. 3.18. Schematic of the charge amplifier during read-out phase. The noise sources are shown.

The loop gain of this stage during read-out phase is given by  $\beta g_m R_L$ . where  $\beta$  is the feedback factor, such that  $\beta = C_I/(C_I + C_S)$ and  $g_m$  is the trans-conductance of the OTA If the loop gain is large enough such that  $\beta g_m R_L \gg 1$ , the effect of  $R_L$  can be ignored during the analysis.

The time constant during read-out phase is given by

$$\tau = (R_{on\_read\_out} + 1/g_m) \times C_S \tag{3.13}$$

where  $R_{on\_read\_out}$  is the total on-resistance of read-out switches for 9 channels

The noise power in  $C_S$  due to the switch noise  $v_n$  is given by

$$\overline{v_{cs,sw}}^{2} = \frac{4kTR_{on\_read\_out}}{4\tau} \times \frac{1}{OSR}$$

$$\Rightarrow \overline{v_{cs,sw}}^{2} = \frac{4kTR_{on\_read\_out}}{4(R_{on\_read\_out} + 1/g_{m}) \times C_{S}} \times \frac{1}{OSR}$$

$$\Rightarrow \overline{v_{cs,sw}}^{2} = \frac{kT/C_{S}}{1 + 1/x} \times \frac{1}{OSR}$$
(3.14)

where  $x = R_{on\_read\_out} \times g_m$  (3.15)

x is a parameter combining  $R_{on\_read\_out}$  and  $g_m$  [3.7]

Considering a single-ended implementation of the OTA (as shown in Fig. 3.19), the PSD of its noise  $v_{no}$  is [3.2]

$$S_{no} = \overline{v_{no}^2} = \frac{8kT}{3g_m} \left( 1 + \frac{g_{mp}}{g_m} \right)$$
 (3.16)

where  $g_{mp}$  is the trans-conductance of the PMOS current source transistor



Fig. 3.19. A simplified implementation of a single-ended OTA

If  $g_{mp} \ll g_m$ , the above equation reduces to

$$S_{no}(f) = \overline{v_{no}^2} = \frac{8kT}{3g_m}$$
 (3.17)

Therefore, noise power in  $C_S$  due to OTA noise becomes

$$\overline{v_{cs,OTA}}^2 = \frac{S_{no}(f)}{4\tau} \times \frac{1}{OSR} = \frac{\left(\frac{8}{3}\right)kT/g_m}{4(R_{on\_read\_out} + 1/g_m) \times C_S} \times \frac{1}{OSR}$$

$$\Rightarrow \overline{v_{cs,OTA}}^2 = \left(\frac{2}{3}\right) \frac{kT/C_S}{1+x} \times \frac{1}{OSR}$$
(3.18)

The three noise voltages  $-v_{cs}$ ,  $v_{cs,sw}$  and  $v_{cs,OTA}$  are uncorrelated. Therefore, their noise powers can be added to get the total noise power. It is obtained by adding equations (3.12), (3.14) and (3.18).

$$\overline{v_{cs,tot}}^2 = \overline{v_{cs}}^2 + \overline{v_{cs,sw}}^2 + \overline{v_{cs,OTA}}^2$$

$$\Rightarrow \overline{v_{cs,tot}}^2 = \frac{kT}{C_S} \times \frac{1}{OSR} \left(1 + \frac{x}{1+x} + \frac{2/3}{1+x}\right)$$

$$\Rightarrow \overline{v_{cs,tot}}^2 = \frac{kT}{C_S} \times \frac{1}{OSR} \left(\frac{\frac{5}{3} + 2x}{1+x}\right)$$
(3.19)

$$\Rightarrow \overline{v_{cs,tot}}^2 = \frac{2kT}{C_S} \times \frac{1}{OSR} \left( 1 - \frac{\frac{1}{6}}{1+x} \right)$$
(3.20)

The relation between the parameter x and time constant  $\tau$  can be derived from equations (3.13) and (3.15) as

$$\tau = \frac{(1 + g_m R_{on\_read\_out})}{g_m} \times C_S = \frac{1 + x}{g_m} \times C_S$$
$$\Rightarrow \tau \times g_m = (1 + x) \times C_S$$
(3.21)

From equations (3.19) and (3.21), an equation can be found for  $g_m$  as follows

$$g_m = \frac{kT}{\tau \times \overline{v_{cs,tot}}^2} \times \frac{1}{OSR} \left(\frac{5}{3} + 2x\right)$$
(3.22)

This equation yields an optimum value of  $g_m$  that is aimed at reducing the power consumption taking into account the constraints on the noise power and the settling time.

The integrated noise voltage at the input of the delay line for 1 channel is  $100 \mu V_{rms}$  [3.1]. Therefore, for 9 channels, the integrated noise power can be calculated as

$$\overline{v_{cs,tot}}^2 = \frac{(100 \ \mu V_{rms})^2}{9} \tag{3.23}$$

In order to ensure that the voltage at the output node of OTA gets settled well within the time window available for read-out,

$$7\tau = 1/(2f_S)$$
  

$$\Rightarrow \tau = \frac{1}{14f_S} = \frac{1}{14 \times 25MHz}$$
  

$$\Rightarrow \tau = 2.86ns$$
(3.24)

(3.25)

Using equations (3.23) and (3.24) in the equation (3.22) for  $g_m$ , we get  $g_m = 0.313 \text{ mS} \times (1.67 + 2x)$ 

From equations (3.15) and (3.25), an equation can be derived relating x and  $R_{on\_read\_out}$  as follows

$$\frac{x}{R_{on\_read\_out}} = 0.313 \, mS \times (1.67 + 2x)$$
$$\Rightarrow x = \frac{0.84 \times R_{on\_read\_out}}{1.6 \, k\Omega - R_{on\_read\_out}}$$
(3.26)

If  $R_{on\_read\_out\_1ch}$  is the total on-resistance of read-out switches for 1 channel,

$$R_{on\_read\_out\_1ch} = 9 \times R_{on\_read\_out}$$
(3.27)

Using this relation in equation (3.26) yields,

$$x = \frac{0.84 \times R_{on\_read\_out\_1ch}}{14.4 \ k\Omega - R_{on\_read\_out\_1ch}}$$
  

$$\Rightarrow x = \frac{0.84}{(14.4 \ k\Omega / R_{on\_read\_out\_1ch}) - 1}$$
(3.28)

From equations (3.25) and (3.28),

$$R_{on\_read\_out\_1ch} \downarrow \Longrightarrow x \downarrow \Longrightarrow g_m \downarrow \Longrightarrow I_B \downarrow$$

In words, as the total on-resistance of read-out switches for 1 channel decreases  $(R_{on\_read\_out\_1ch})$ , the value of the parameter x reduces. This in turn decreases the value of the trans-conductance of main OTA  $(g_m)$ , and hence, eventually its power consumption reduces. Therefore,  $R_{on\_read\_out\_1ch}$  should be kept as small as possible in order to obtain the most power-efficient solution. However, if it is too low, then the corresponding size of the NMOS transistor required to implement it will be large. Consequently, charge injection and clock feed-through effects caused due to the switch will increase, which is highly undesirable in switched-capacitor circuits. Therefore, an optimum value of  $R_{on\_read\_out\_1ch}$  is required to be selected such that the non-idealities of the switch, which scale with its size are kept to a minimum (high  $R_{on\_read\_out\_1ch}$ , small size), and the power consumption of the OTA is also small (low  $R_{on\_read\_out\_1ch}$ , large size).

The on-resistance of a minimum-sized NMOS switch  $(W/L = 0.22\mu m/0.18\mu m)$  in TSMC 0.18µm CMOS process is  $R_{on\_min\_siz\_sw\_1ch} = 6.5k\Omega$  (when its source terminal is at a DC level of  $V_{DD}/2 = 0.9V$ ).

Therefore, in the proposed solution, the sizing of the read-out switches  $S_3$  and  $S_4$  (Fig. 3.16(a)) is done as follows:

Switch S<sub>4</sub> (driven by clock signal *clk1\_out\_del*) is of minimum size, to keep charge injection and clock feed-through due to this switch minimum.

$$R_{on,S_4} = \frac{R_{on\_\min\_siz\_sw\_1ch}}{9}$$
(3.29)

2. Switch  $S_3$  (driven by clock signal *clk1\_out*) connected to virtual ground is made large (small on-resistance), to improve the power efficiency of the OTA. Making this switch
large has a minimal effect on the shift in voltage at the output node of OTA due to switch non-idealities (that scale with switch size). This is because, when it is turned ON, the reset clock  $clk_rst$  is also active, and hence, any spikes on the virtual ground node of the OTA decay very quickly, so that there is no permanent change in voltage at the output node (as discussed in sub-section 3.2.1).

Setting 
$$R_{on\_read\_out\_1ch} = 6.9k\Omega$$
 ( $\approx R_{on\_min\_siz\_sw\_1ch}$ ), such that  
 $R_{on\_small\_sw} = R_{on\_min\_siz\_sw\_1ch} = 6.5k\Omega$   
and  $R_{on\_large\_sw} = 400\Omega$ 

where  $R_{on\_small\_sw}$  is the on-resistance of the small read-out switch, and  $R_{on\_large\_sw}$  is the on-resistance of the large read-out switch.

Therefore, *x* becomes

$$x = \frac{0.84}{(14.4 \ k\Omega/R_{on\_read\_out\_1ch}) - 1} = \frac{0.84}{(14.4 \ k\Omega/6.9 \ k\Omega) - 1} \approx 0.8$$

So, *x* = 0.8

Using this value of x (= 0.8) in equation (3.25),  $g_m = 1 mS$ 

Let us suppose  $I_B$  is the bias current of OTA.

If the input transistor of OTA is biased in weak-inversion region (to maximize its transconductance per unit bias current),

$$\frac{g_m}{I_B} = 20 \Longrightarrow I_B = 50 \ \mu A$$

The value of sampling capacitance per channel,  $C_{S,1ch}$  can be obtained from equation (3.20) as

$$C_{S,1ch} = \frac{C_S}{9} = 198.72 \, fF \times \left(1 - \frac{0.167}{1+x}\right) \tag{3.30}$$

Putting x = 0.8 in this equation,  $C_{S,1ch} = 180 fF$ 

Total sampling capacitance for 9 channels is given by

$$C_S = 9 \times C_{S,1ch} = 1.62 \ pF$$

#### **Offset Calibration Loop**

The trans-conductance values of OTA1\_OCL and OTA2\_OCL and the sizing of memory capacitors ( $C_m$ ) in the delay line in OCL need to be ascertained.

#### OTA1\_OCL and Memory Capacitor in Delay Line in OCL

The settling time of the OCL in the calibration phase is determined by the trans-conductance of OTA1\_OCL ( $g_{m_OTA1_OCL}$ ) and  $C_m$ . In order to minimize charge injection and clock feed-through in the delay line in OCL, a large value of  $C_m$  is desirable. However, it cannot be made too high, considering the goal of achieving an area-efficient design in mind. Therefore, its value can be set equal to the largest capacitor in the system, i.e., the total sampling capacitance for 9 channels,  $C_s$ .

Therefore, 
$$C_m = C_S = 1.62pF$$
 (3.31)

The unity gain bandwidth of OCL should be less than the lower -3 dB bandwidth of the ultrasound signal (4.5 MHz), in order to ensure that the noise generated in the OCL doesn't increase the total input-referred noise at the input of the delay line.

$$f_{UGB\_OCL} \ll 4.5 MHz$$

$$\Rightarrow \frac{g_{m\_OTA1\_OCL}}{2\pi C_m} \ll 4.5 MHz$$
(3.32)

Substituting the value of  $C_m$  obtained in equation (3.31), we get

$$g_{m \ OTA1 \ OCL} \ll 46 \ \mu S$$

Therefore, a value of 20  $\mu$ S was chosen for  $g_{m_OTA1_OCL}$ . Choosing this value ensures sufficient settling speed within the calibration window. Besides, the settling error, which is the value of peak-to-peak ripple ( $\Delta V_{phase}$ ) at the output of main OTA at the end of the calibration period, is also small (50  $\mu$ V) when this value is chosen, as evident from Table 3.1. Ideally,  $\Delta V_{phase}$  should be smaller than the thermal noise floor level when an OCL is employed, in order to ensure that the dynamic range is limited by noise rather than ripple. This is explained in detail in sub-section 5.1.3.

| $g_{m_OTA1_OCL}$ (in $\mu$ S) | $\Delta V_{phase}$ (in $\mu V$ ) |
|-------------------------------|----------------------------------|
| 1                             | 1000                             |
| 10                            | 100                              |
| 20                            | 50                               |
| 50                            | 20                               |
| 100                           | 10                               |

Table 3.1 Variation of peak-to-peak ripple value with  $g_{m_OTA1_OCL}$ 

## OTA2\_OCL

If there is any mismatch in the switches in the delay line in OCL, it will also generate a ripple pattern at the output node of main OTA (OTA1), which will add to the original ripple pattern caused due to mismatch in the input delay line. In order to minimize its effect on the dynamic range of the input delay line, this ripple pattern needs to be attenuated. This can be done by setting the trans-conductance of OTA2\_OCL ( $g_{m_OTA2_OCL}$ ) as a certain factor (< 1) of the trans-conductance of main OTA ( $g_m$ ). Since, it is desirable to increase the dynamic range by a factor of 10 (by 20*dB*),  $g_{m_OTA2_OCL}$  can be set as

$$g_{m\_OTA2\_OCL} = g_m / 10$$

$$\Rightarrow g_{m\_OTA2\_OCL} = \frac{1 mS}{10} = 100 \ \mu S$$
(3.33)

| Parameter                                | Symbol                   | Value          |
|------------------------------------------|--------------------------|----------------|
| Trans-conductance of main OTA            | $g_m$                    | 1 <i>mS</i>    |
| (OTA1)                                   |                          |                |
| Bias current of OTA1                     | $I_B$                    | 50 μ <i>Α</i>  |
| Total on-resistance of read-out switches | $R_{on\_read\_out\_1ch}$ | 6.9 <i>k</i> Ω |
| for 1 channel in the input delay line    |                          |                |
| Sampling capacitance per channel in the  | $C_{S,1ch}$              | 180 <i>fF</i>  |
| input delay line                         |                          |                |
| Trans-conductance of OTA1_OCL            | $g_{m\_OTA1\_OCL}$       | 100 μ <i>S</i> |

Table 3.2. Summary of the design parameters

| Trans-conductance of OTA2_OCL         | $g_{m\_OTA2\_OCL}$ | 100 μ <i>S</i> |
|---------------------------------------|--------------------|----------------|
| Memory capacitor in delay line in OCL | $C_m$              | 1.62 <i>pF</i> |

# **3.4 Conclusions**

In this chapter, the system-level design of a novel high-dynamic-range micro-beamformer based on active charge-mode summation is presented. The new topology of the switched-capacitorbased charge amplifier can be used to integrate the TGC functionality with delay and coherent summation operation. A stray-insensitive topology is presented for the delay line. The factor limiting the dynamic range in the proposed design is analyzed and the solution is presented in the form of the OCL. Finally, noise analysis is performed to determine the design parameters as summarised in Table 3.2. The next chapter will describe the transistor-level design of the proposed high-dynamic-range micro-beamformer.

# References

[3.1] Zili Yu, Low-Power Receive-Electronics for a Miniature 3D Ultrasound Probe, PhD thesis, Delft University of Technology, 2012.

[3.2] B. Razavi, Design of Analog CMOS Integrated Circuits, Boston: McGraw-Hill, 2001.

[3.3] G. Palmisano, G. Palumbo, and R. Salerno, "A 1.5-V High Drive Capability CMOS Op-Amp," *IEEE Journal of Solid-State Circuits*, Vol. 34, No. 2, February 1999

[3.4] R. Brodersen, P. Gray, and D. Hodges, "MOS switched-capacitor filters," *Proc. IEEE*, vol. 67, no. 1, pp. 61–75, Jan. 1979.

[3.5] Paul Hasler, "Switched Capacitor Circuits II," [Online] Available: http://users.ece.gatech.edu/phasler/Courses/ECE6414/Unit4/SwitchedCapCircuitsII.pdf (November 2012)

[3.6] J. Wong, "CMOS Sample-and-Hold Circuits," [Online] Available:

http://www.eecg.toronto.edu/~kphang/papers/2001/jwong\_SH.pdf (November 2012)

[3.7] R. Schreier, J. Silva, J. Steensgaard, and G. C. Temes, "Design-oriented estimation of thermal noise in switched-capacitor circuits," *IEEE Trans. Circuits Syst.-I*, vol. 52, no. 11, pp. 2358–2367, Nov. 2005.

# **Chapter 4**

# **Transistor-Level Design**

In this chapter, a transistor-level design of the individual blocks in the system is presented. Throughout the chapter, the emphasis is on achieving a power-efficient and area-efficient design, which are the primary goals of this thesis project. A brief summary of the different blocks in the front-end receive signal processing chain is first given in section 4.1. In the subsequent sections, a transistor level implementation of each of these blocks is discussed. A power-efficient LNA based on dynamic threshold voltage MOS transistor (DTMOS) is presented in section 4.2. The next section (section 4.3) deals with the implementation of the switches in the stray-insensitive delay line with MOS transistors. Subsequently, a single-ended topology of the main OTA is discussed in section 4.4, and the reasons behind selecting such an architecture are described. Additional circuitry in the form of an auto-zeroing (AZ) loop is required for the main OTA, which is also explained in this section. In the next section, the transistor level implementation of the individual blocks in the OCL is presented. The overall clocking scheme, including the clock signals used in the OCL and the AZ loop is discussed in section 4.6. Finally, conclusions are drawn in section 4.7.

## 4.1 Overview of Individual Blocks

The front-end receive signal processing chain (shown in Fig. 4.1) consists of an LNA, a charge amplifier which combines the delay line and the main OTA (OTA1) with its feedback capacitor ( $C_I$ ), and the OCL at the output of OTA1. The output signal of a transducer element is amplified by the gain of the LNA (20 *dB*).



Fig. 4.1. Block level diagram of the front-end receive signal processing chain

The gain of the LNA helps in relaxing the noise requirements of the subsequent stages. The amplified signal at the output of the LNA is then processed by the switched-capacitor charge amplifier which provides a fixed gain (20*dB*), corresponding to the ratio of the total sampling capacitance for 9 channels of the delay line ( $C_s$ ) and  $C_l$ . The output signal of the charge amplifier is a sampled-and-held return-to-zero type of signal. The OCL helps in reducing the periodic ripple pattern at the output of OTA1, due to mismatch in the input delay line. Thus, it ensures that the signal at the output of OTA1 is free from any offset ripple pattern during normal operation.

## 4.2 DTMOS Based LNA

The cross section of a DTMOS transistor and its equivalent transistor symbol is shown in Fig. 4.2. In a DTMOS device, the gate terminal is connected to its substrate. It can be considered either as a lateral bipolar p-n-p transistor or a PMOS transistor whose threshold voltage can be controlled dynamically [4.1].



Fig. 4.2 DTMOS transistor (a) Cross-section and (b) Equivalent transistor level symbol

Therefore, in a DTMOS transistor, the substrate voltage varies along with the input gate voltage. This results in change in threshold voltage,  $V_{th}$  of the device as per the following equation:

$$\left|V_{th,p}\right| = V_{th0,p} + \gamma_p \left(\sqrt{\left|2\varphi_F\right| + V_{BS}} - \sqrt{\left|2\varphi_F\right|}\right)$$
(4.1)

where  $V_{th0,p}$  is the zero bias threshold voltage,  $\gamma_p$  is the bulk effect factor,  $\varphi_F$  is the Fermi potential and  $V_{BS}$  is the bulk-to-source junction voltage.

It is possible to implement only p-type DTMOS since their n-well can be controlled in standard digital technology.

When the input voltage at the gate of a DTMOS device is high, it is in off-state and has the same threshold voltage, off-current and sub-threshold slope as a normal PMOS transistor. The transistor is off and it has the same threshold voltage, off-current and sub-threshold slope as a normal PMOS transistor. When the input gate voltage is decreased below the source voltage  $(V_{DD})$ , the device turns ON. The bulk-source junction voltage  $(V_{BS})$  reduces and hence the source-substrate junction gets forward biased, thereby, reducing the threshold voltage of the device. Consequently, a high input range can be obtained with a DTMOS transistor. Since the threshold voltage reduces, the source-to-drain current  $(I_{SD})$  of a DTMOS device is higher than that of a typical PMOS transistor, and therefore, it is a good choice for operation in sub-threshold

region, obviating additional area [4.2] [4.3]. This is in line with the goal of our research of achieving an area-efficient design. Moreover, the features of a DTMOS device operating in sub-threshold region are identical to that of a lateral bipolar p-n-p transistor, without needing the large base current. In addition, in this region of operation, its flicker noise is less than that of a typical MOS transistor [4.3].

A DTMOS transistor (operating in sub-threshold region) can be used to implement a powerefficient version of LNA in our design, as shown in Fig. 4.3. The load for the LNA is the sampling capacitance per channel,  $C_{s,1-ch}$  (= 180*fF*) of the delay line. In order to make the scenario more realistic, the total on-resistance of the sampling switches per channel ( $R_{on\_samp,1-ch}$ ) is also taken into consideration. If implemented with minimum sized NMOS transistors,

$$R_{on\_samp,1-ch} = 2 \times R_{on\_min\_siz\_sw\_1ch}$$
$$\implies R_{on\_samp,1-ch} = 2 \times 6.5 \ k\Omega = 13 \ k\Omega$$
(4.2)

DTMOS based LNA along with its load is shown in Fig. 4.4.



Fig. 4.3. LNA implemented using DTMOS transistor



Fig. 4.4. DTMOS based LNA with load

The time constant during sampling phase becomes

$$\tau = (R_{LOAD} + R_{on\_samp,1-ch}) \times C_{s,1-ch}$$
(4.3)

In order to ensure 0.1% settling accuracy,

$$7\tau = 1/f_s \tag{4.4}$$

where  $f_s$  is the sampling frequency, such that  $f_s = 25MHz$ 

Using equation (4.2) in (4.3), we get

$$7 \times (R_{LOAD} + R_{on\_samp,1-ch}) \times C_{s,1-ch} = 1/25MHz$$
$$\implies R_{LOAD} = \frac{1}{7 \times C_{s,1-ch} \times 25MHz} - R_{on\_samp,1-ch}$$
$$\implies R_{LOAD} = 18.7k\Omega$$

Gain of the LNA is 10 (or 20dB). If  $g_{m,LNA}$  is the trans-conductance of the DTMOS transistor,

$$g_{m,LNA} \times R_{LOAD} = 10$$

(The underlying assumption is that the output resistance of the transistor is large compared to  $R_{LOAD}$ .)

$$\Rightarrow g_{m,LNA} = 10/R_{LOAD} \approx 535 \,\mu S$$

Since the DTMOS device is operated in the sub-threshold region,

$$\frac{g_{m,LNA}}{I_{B,LNA}} = 20 \Longrightarrow I_{B,LNA} = \frac{g_{m,LNA}}{20} \approx 27 \ \mu A$$

where  $I_{B,LNA}$  is the bias current of LNA.

Therefore, power consumption of the LNA is

$$P_{LNA} = V_{DD} \times I_{B,LNA} = 1.8 V \times 27 \ \mu A \approx 49 \ \mu W$$

Power dissipation of the LNA designed in the previous work [4.4] was  $107 \mu W$ .

Therefore, percentage reduction in power consumption of LNA when implemented using DTMOS transistor is

$$\frac{107 - 49}{107} \times 100 \approx 54\%$$

The input-referred noise of the LNA is  $\sim 6 \mu V_{rms}$  in the signal bandwidth of interest, which is less than the output-referred noise of each transducer element (15  $\mu V_{rms}$ ), as discussed in section 2.2. Therefore, the required noise level requirement is also met. The design parameters for the DTMOS based LNA are summarised in table 4.1.

Table 4.1. Summary of design parameters of DTMOS based LNA

| Parameter                                 | Symbol             | Value          |
|-------------------------------------------|--------------------|----------------|
| Trans-conductance of the DTMOS transistor | $g_{m,LNA}$        | 535 μ <i>S</i> |
| Bias current of LNA                       | I <sub>B,LNA</sub> | 27 μΑ          |
| Load Resistor                             | R <sub>LOAD</sub>  | $18.7k\Omega$  |

# 4.3 Implementation of Stray-Insensitive Delay Line

The stray-insensitive delay line introduced in section 3.2.1 is shown in Fig. 4.5, and its corresponding transistor-level implementation is depicted in Fig. 4.6. Only 1 channel of the delay line is shown here for simplicity.



Fig. 4.5. Stray-Insensitive Delay Line using Ideal switches



Fig. 4.6. Transistor level implementation of the Stray-Insensitive Delay Line

#### 4.3.1 Timing Considerations

The clock signals driving the MOS switches (in Fig. 4.6) are shown in Fig. 4.7. The two new clock signals added (compared to Fig. 3.6(b)) are the inverse clocks  $clk1\_out\_inv$  and  $clk2\_out\_inv$ , which drive the PMOS transistors ( $M_5$  and  $M_{10}$ , respectively) in the CMOS switches ( $S_4$  and  $S_8$ ) connected to virtual ground node of main OTA.



Fig. 4.7. Waveforms of the Clock signals

#### 4.3.2 Implementation of Switches

The sampling switches in each branch  $(S_1, S_2 \text{ and } S_5, S_6)$  are realized with minimum sized NMOS transistors  $(W/L = 0.22um/0.18um) - M_1, M_2$  and  $M_6, M_7$ . This is done in order to minimize charge injection caused by these switches. The sizing of the read-out switches was explained in section 3.3.2. The read-out switches  $S_3$  and  $S_7$ , driven by clock signals

 $clk1\_out\_del$  and  $clk2\_out\_del$ , are implemented with minimum-sized NMOS transistors  $M_3$ and  $M_8$ , respectively, in order to minimize their clock feed-through. Switches  $S_4$  and  $S_8$ , connected to virtual ground node, are implemented as CMOS switches. These need to be large switches as explained in section 3.3.2. Therefore, the magnitude of the voltage spike caused by clock feed-through at the virtual ground node (and hence the output node of main OTA), when they are just turned on, is also large. Hence, the time required for this spike to decay to the steady state value at the virtual ground node is also large. In order to reduce the magnitude of this spike, these read-out switches are realized as CMOS switches, which provide a first-order cancellation of clock feed-through.

This is illustrated in Fig. 4.8, where  $C_{ov,M_4}$  and  $C_{ov,M_5}$  are the overlap capacitances of transistors  $M_4$  and  $M_5$ , respectively, and  $C_{vg-gnd}$  is the capacitance from the virtual ground node of the main OTA to ground. If  $W_{M_4}$ ,  $W_{M_5}$  are the widths of transistors  $M_4$  and  $M_5$ , respectively, the total change in voltage at virtual ground node,  $\Delta V_{vg}$  due to clock feed-through is given by

$$\Delta V_{vg} = V_{DD} \times \frac{W_{M_4} \times C_{ov,M_4}}{W_{M_4} \times C_{ov,M_4} + C_{vg-gnd}} - V_{DD} \times \frac{W_{M_5} \times C_{ov,M_5}}{W_{M_5} \times C_{ov,M_5} + C_{vg-gnd}}$$
(4.5)

In order to satisfy the criterion:  $\Delta V_{vg} = 0$ , equation (4.5) reduces to

$$\frac{W_{M_4} \times C_{ov,M_4}}{W_{M_4} \times C_{ov,M_4} + C_{vg-gnd}} = \frac{W_{M_5} \times C_{ov,M_5}}{W_{M_5} \times C_{ov,M_5} + C_{vg-gnd}}$$
(4.6)

Therefore, from the above equation, it is clear that the following condition must be fulfilled to make  $\Delta V_{vg} = 0$ :

$$W_{M_4} \times C_{ov,M_4} = W_{M_5} \times C_{ov,M_5} \tag{4.7}$$

To put it in words, the product of width and the overlap capacitance of the transistors  $M_4$  and  $M_5$  must be equal in order to cancel clock feed-through. In our design, a first order cancellation of clock feed-through was achieved by making  $W_{M_4} = W_{M_5}$ . From DC operating point analysis in Cadence, it was observed that the values of overlap capacitances  $C_{ov,M_4} \& C_{ov,M_5}$  are close, though not exactly equal.



Fig. 4.8. Diagram illustrating reduction of clock feed-through using CMOS switch

# 4.4 Single-Ended Main Gm Stage



Fig. 4.9. Single-ended  $G_m$  stage

The main OTA in the front-end receive signal processing chain is single-ended (single-ended input and single-ended output), as shown in Fig. 4.9. A cascode transistor  $M_2$  is added in order to increase the open-loop gain of the OTA. In this section, the choice of a single-ended topology for the main OTA is explained first. An AZ loop is required for the main OTA, which is discussed next. Finally, a trade-off between output swing and noise is presented.

# 4.4.1 Choice of Single-Ended Gm Stage Over Differential Configuration Single-Ended Nature of Transducer Output Signal

The output signal from a transducer element is inherently single-ended. Therefore, making the signal processing chain differential at any point (in order to use a fully differential OTA) would require additional circuitry. Consequently, the power efficiency and the area efficiency of the system will become worse.

#### **Power Consumption**

A single-ended charge amplifier is shown in Fig. 4.10(a). The capacitors  $C_s$  and  $C_l$  are the sampling and integration capacitors, respectively. The transistor level implementation of the OTA is given in Fig. 4.10(b). In order to obtain maximum trans-conductance efficiency,  $M_1$  needs to be biased in sub-threshold region. The relation between trans-conductance of the OTA,  $g_m$  and the required bias current,  $I_B$  is given by

$$\frac{g_m}{I_B} = 20 \tag{4.8}$$



Fig. 4.10. (a) A single-ended charge amplifier, and (b) Realization of its  $g_m$  block

A fully differential version of the charge amplifier is depicted in Fig. 4.11(a) and the transistorlevel implementation of its OTA is shown in Fig. 4.11(b). It can be seen from Fig. 4.11(b), that in order to obtain the same trans-conductance  $g_m$  for the OTA (as in single-ended case), and hence the same open-loop gain, the bias current required is  $2I_B$ . Since the signal processing chain is kept single-ended at every node (because of the inherent single-ended nature of the output signal of each transducer element), a fully differential signal source is not available at the input of main OTA. Therefore, for a given value of  $g_m$ , the power dissipation in a fully differential implementation of the charge amplifier is twice that of the corresponding single-ended realization.



Fig. 4.11. (a) A fully differential charge amplifier, and (b) Realization of its  $g_m$  block

#### **Noise Analysis**

From equation (3.20) in chapter 3, the total input-referred noise power when a single-ended OTA is used in the switched-capacitor charge amplifier, was given by

$$\overline{v_{n,se}}^2 = \frac{2kT}{C_S} \times \frac{1}{OSR} \left( 1 - \frac{\frac{1}{6}}{1+x} \right)$$
(3.20)

If a fully differential version of the OTA is used, as it has been done in the noise analysis in [3.7], the total input-referred noise power becomes

$$\overline{v_{n,diff}}^2 = 2 \times \frac{2kT}{C_S} \times \frac{1}{OSR} \left( 1 - \frac{\frac{1}{6}}{1+x} \right)$$
(4.9)

The ratio of the noise powers can be obtained from equations (4.9) and (3.20) as

$$\frac{v_{n,diff}^2}{v_{n,se}^2} = 2$$
(4.10)

Therefore, the input-referred noise power when a fully differential OTA is used (in the switchedcapacitor charge amplifier) is twice compared to the case in which a single-ended version of OTA is used. As a result, the corresponding input-referred noise voltage for the fully differential case is  $\sqrt{2}$  times that of the single-ended case. The full-scale signal swing of the fully differential OTA at its output is twice that of the single-ended OTA, for a given value of  $g_m$ . Therefore, the output dynamic range of the charge amplifier employing the fully-differential OTA is  $\sqrt{2}$  times that of the one using single-ended OTA.

In order to keep the noise power same as that of the single-ended case, the size of the sampling capacitor will need to be doubled. Also, in order to keep the gain of the charge amplifier constant, the size of integration capacitor will require to be doubled. This will significantly add to the area overhead of the fully differential implementation, which was already high because of using two half circuits (and hence twice the number of components). Therefore, from the perspective of an area efficient design, it is also not a good choice. A point to be noted here is that in both cases, the overall trans-conductance of the OTA is still the same  $(g_m)$ . Therefore, the power consumption is also twice for the fully differential implementation of the charge amplifier (compared to the single-ended case).

#### 4.4.2 Auto-Zeroing Loop

#### Motivation

In the single-ended OTA shown in Fig. 4.9, the input transistor  $M_1$  needs to biased in the subthreshold region, as explained in the previous sub-section. Therefore, the DC bias level at the gate of  $M_1$  ( $V_{DC,in}$ ) should be close to the threshold voltage of the device ( $\approx 0.5V$ ). Also, in order to maximize the voltage swing at the output node of OTA, the DC level at its output ( $V_{DC,out}$ ) should be half of the supply voltage ( $V_{DD}/2$ ). During the reset phase of the charge amplifier (shown in Fig. 4.6), the OTA is operating in unity gain mode. Its input and output terminals are shorted and hence, the DC levels at these two terminals must be equal. Therefore, a DC level shift is required between the input and output terminals of the OTA. This can be accomplished by means of an AZ loop.

#### **Working Mechanism**

The AZ loop required for the main OTA is shown in Fig. 4.12. Only one channel of the delay line is considered for simplicity. The offset calibration loop (OCL) is not shown here to keep the focus on the AZ loop. The AZ loop is a negative feedback loop consisting of the main OTA (OTA1), an auxiliary OTA (OTA1\_AZL), capacitor  $C_{AZ}$  and switches  $S_9$ ,  $S_{10}$  and  $S_{11}$ .



Fig. 4.12. Switched-capacitor charge amplifier along with the AZ loop. The loop is shown in a dashed line.

The operation of the AZ loop can be explained in 2 phases:

- 1. Phase I: Auto-Zeroing
- 2. Phase II: Read-out from Sampling Capacitor of Delay Line

#### Phase I: Auto-Zeroing

The circuit conditions during this phase are depicted in Fig. 4.13. The following events occur in this phase:

1. The input signal is sampled on capacitor  $C_{s1}$ , as the sampling switches  $S_1$  and  $S_2$  driven by clock signals *clk1\_in* and *clk1\_in\_del*, respectively, are ON in the input delay line.

- 2. The read-out switch  $S_8$  connected to virtual ground of OTA1 and driven by *clk2\_out* signal is ON.
- Switches S<sub>9</sub>, S<sub>10</sub> and S<sub>11</sub> driven by the reset clock signals (*clk\_rst* and *clk\_rst\_adv*) are ON.
- 4. The DC level at the output of OTA1,  $V_{DC,out}$  is compared with the reference voltage,  $V_{ref}$  (=  $V_{DD}/2$ ) by OTA1\_AZL. Depending upon the difference between  $V_{DC,out}$  and  $V_{ref}$ , OTA1\_AZL injects a current  $I_{AZ}$  that starts charging  $C_{AZ}$  towards  $V_{AZ}$ .

 $V_{AZ}$  is given by

$$V_{AZ} = V_{DC,out} - V_{DC,in} = \frac{V_{DD}}{2} - V_{DC,in}$$
(4.11)

where  $V_{DC,in}$  is the DC level at the input of OTA1.



Fig. 4.13. Operation of AZ loop, Phase I: Auto-zeroing

5. At the end of this phase, first the clock signal  $clk_rst_adv$  goes from high to low. Subsequently, after a time delay,  $clk_rst$  makes the transition from  $V_{DD}$  to 0. This type of clocking scheme needs to be employed because of the following line of reasoning. Due to the difference in DC levels at the input and output of OTA1, the overdrive voltage of the switch  $S_{11}$  (driven by  $clk_rst_adv$ ) is higher than that of the switches  $S_9$  and  $S_{10}$ (both driven by  $clk_rst$ ). Therefore, if the falling edges of  $clk_rst_adv$  and  $clk_rst$  are identical,  $S_9$  and  $S_{10}$  will turn OFF at the end of this phase, but  $S_{11}$  will still be connected to the gate of the input transistor ( $M_1$ ) of OTA1 and to one end of  $C_{AZ}$  for some more time (till its overdrive voltage remains greater than its threshold voltage). During this time, it can inject additional charge onto  $C_{AZ}$  (due to charge injection), thereby altering the voltage at the gate of  $M_1$ . Since the OTA1 is no longer operating in unity gain mode once  $S_9$  turns OFF, this will result in a permanent change in the DC level at the gate of  $M_1$ , which is undesirable. Therefore, the falling edge of  $clk_rst_adv$  is advanced relative to that of  $clk_rst$ .

#### Phase II: Read-out from Sampling Capacitor of Delay Line

The operation of the AZ loop during phase II is depicted in Fig. 4.14. Following events take place during this phase:

- Sampling of the input signal continues on C<sub>s1</sub> as clk1\_in and clk1\_in\_del are still active (high).
- 2. The AZ loop is disconnected from the input and output of OTA1. Voltage,  $V_{AZ}$  corresponding to the difference in common mode levels at the input and output of main OTA is established across  $C_{AZ}$ .
- 3. Both the read-out switches driven by clocks  $clk2\_out$  and  $clk2\_out\_del$  ( $S_8$  and  $S_7$ , respectively) are ON, and the charge stored on capacitor  $C_{s2}$  is transferred onto  $C_I$ .



Fig. 4.14. Operation of AZ loop, Phase II: Read-out from sampling capacitor of delay line

#### **Determination of Design Parameters**

A simple differential topology is chosen for the auxiliary OTA (OTA1\_AZL).

Let us suppose  $g_{m,OTA1\_AZL}$  is the trans-conductance of each transistor of the input pair of OTA1\_AZL and  $I_{B,OTA1\_AZL}$  is the bias (tail) current of OTA1\_AZL

The unity gain bandwidth of the AZ loop,  $f_{UGB,AZ\_loop}$  is given by

$$f_{UGB,AZ\_loop} = \frac{g_{m,OTA1\_AZL}}{2\pi C_{AZ}}$$
(4.12)

The derivation of this equation can be found in Appendix A.

In order to ensure that noise sampled on  $C_{AZ}$  during the auto-zeroing operation does not fold onto the signal band of interest (4.5*MHz* – 7.5*MHz*), the AZ loop must be a slow settling loop. In terms of an equation,

$$f_{UGB,AZ\_loop} \ll 4.5MHz$$

$$\Rightarrow \frac{g_{m,OTA1\_AZL}}{2\pi C_{AZ}} \ll 4.5MHz$$

$$\Rightarrow C_{AZ} \gg \frac{g_{m,OTA1\_AZL}}{2\pi \times 4.5MHz}$$

Taking a factor of 4,

$$C_{AZ} = 4 \times \frac{g_{m,OTA1\_AZL}}{2\pi \times 4.5MHz}$$
(4.13)

There are two factors which need to be taken into account for determining the value of  $C_{AZ}$ :

1. The capacitive attenuation of the signal at the gate terminal of the input transistor of the main OTA  $(M_1)$  is shown in Fig. 4.15.





The relation between  $v_{gate}$  and  $v_{signal}$  is given by

$$v_{gate} = v_{signal} \times \frac{C_{AZ}}{C_{AZ} + C_{gs,M_1}}$$
(4.14)

where  $C_{gs,M_1}$  is the gate to source capacitance of  $M_1$ 

Therefore,  $C_{AZ}$  must be significantly larger ( $\approx 100X$ ) than  $C_{gs,M_1}$ , in order to prevent attenuation of the signal at the gate terminal of  $M_1$ .

From DC operating point analysis in Cadence,  $C_{gs,M_1} \approx 20 fF$ 

Therefore, an initial estimate of  $C_{AZ}$  would be

$$C_{AZ,init\_est} \approx 100 \times C_{gs,M_1} = 2 \, pF \tag{4.15}$$

2. In order to achieve an area efficient design, the maximum value of  $C_{AZ}$  is limited by the largest capacitor in the design, i.e., the total sampling capacitance for 9 channels,  $C_S$ .

Hence, 
$$C_{AZ,max} = C_S \approx 1.6 \, pF$$
 (4.16)

Considering equations (4.15) and (4.16), value of  $C_{AZ}$  is chosen as 1.4 pF.

Therefore, equation (4.13) becomes

$$4 \times \frac{g_{m,OTA1\_AZL}}{2\pi \times 4.5 MHz} = 1.4 pF$$
$$\implies g_{m,OTA1\_AZL} = 10 \ \mu S$$

The transistors of the input pair of OTA1\_AZL are biased in weak inversion region to maximize their trans-conductance efficiency.

$$\frac{g_{m,OTA1\_AZL}}{I_{B,OTA1\_AZL}/2} = 20 \Longrightarrow \frac{10 \ \mu S}{I_{B,OTA1\_AZL}/2} = 20$$
$$\implies I_{B,OTA1\_AZL} = 1 \ \mu A$$

The design parameters for the AZ loop are summarised in table 4.2.

| Parameter                                         | Symbol                  | Value         |
|---------------------------------------------------|-------------------------|---------------|
| Trans-conductance of transistors of input pair of | $g_{m,OTA1\_AZL}$       | 10 μ <i>S</i> |
| OTA1_AZL                                          |                         |               |
| Bias (tail) current of OTA1_AZL                   | I <sub>B,OTA1_AZL</sub> | 1 μΑ          |
| Size of AZ capacitor                              | $C_{AZ}$                | 1.4 <i>pF</i> |

Table 4.2. Summary of design parameters of AZ loop

#### Implementation of OTA1\_AZL and Switches

OTA1\_AZL is implemented as a differential amplifier. The transistors of the input pair ( $M_1$  and  $M_2$ ) are biased in sub-threshold region. In order to minimize the input-referred offset voltage caused due to mismatch in transistors, the gate areas (width × length) of  $M_1$  and  $M_2$  were increased. In addition, the gate areas of the NMOS load transistors  $M_3$  and  $M_4$  were also increased [4.5].



Fig. 4.16. Schematic of OTA1\_AZL used in AZ loop

The switches  $S_9$ ,  $S_{10}$  and  $S_{11}$  shown in Fig. 4.12 are implemented with minimum-sized NMOS transistors. This is done in order to minimize charge injection and clock feed-through caused by these switches.

#### 4.4.3 Trade-off Between Output Swing and Input-Referred Noise

An expanded version of Fig. 4.9 showing the PMOS current mirror transistors implementing the bias current source is given in Fig. 4.17. In order to achieve a large output swing, the overdrive voltages of the PMOS current source transistors  $-M_3$  and  $M_4$  should be small. The positive excursions of the signal swing at the output node of main OTA will be limited by these overdrive voltages.



Fig. 4.17. Expanded version of Fig. 4.9. showing the PMOS current mirror transistors

From equation (3.16) in noise analysis of chapter 3, the PSD of the input-referred noise of the OTA is given by

$$\overline{v_{no}}^2 = \frac{8kT}{3g_m} \left( 1 + \frac{g_{mp}}{g_m} \right) \tag{3.16}$$

where  $g_{mp}$  is the trans-conductance of the PMOS current source transistor

In this case,  $g_{mp} = g_{m,M_4} \& g_m = g_{m,M_1}$  (since the noise contributed by the cascode transistors is assumed to be negligible [3.2]).

Therefore, equation (3.16) becomes

$$\overline{v_{no}^{2}} = \frac{8kT}{3g_{m}} \left( 1 + \frac{g_{m,M_{4}}}{g_{m,M_{1}}} \right)$$
(4.17)

This OTA noise,  $\overline{v_{no}}^2$  will be sampled on capacitor  $C_s$  during the read-out phase and eventually contribute to the input-referred noise of the delay line.

Therefore, in order to minimize the input-referred noise, the trans-conductance of the PMOS current source transistor  $M_4$  ( $g_{m, M_4}$ ) should be low. Since  $M_4$  is biased in strong inversion region, the need for a low value of  $g_{m, M_4}$  would imply requirement of a high overdrive voltage for  $M_4$  ( $V_{OV, M_4}$ ). However, this requirement of high value of  $V_{OV, M_4}$  for reducing the input-referred noise is in direct contradiction with the need for low  $V_{OV, M_4}$ , in order to achieve a high output swing. Therefore, there exists a compromise between input-referred noise and output swing.

In our design, the emphasis was on maximizing the output swing (in order to get a high dynamic range) by minimizing the overdrive voltage for  $M_4$  (or making  $g_{m,M_4}$  high). The ratio of  $g_{m,M_4}$  and  $g_{m,M_1}$  in our design was ~0.5. As a result, there is a penalty in terms of marginal increase in input-referred noise.

#### 4.5 Offset Calibration Loop

The OCL along with the AZ loop is shown in the signal processing chain in Fig. 4.18. The transistor level implementation of the OTAs and the switches in the OCL will be discussed in this section. Finally, the need for a buffer in each branch of the delay line in OCL and its implementation is presented.

#### 4.5.1 Transistor Level Realization of Gm Stages

OTA1\_OCL in OCL can be implemented as a simple differential amplifier, similar to OTA1\_AZL (of AZ loop) as depicted in Fig. 4.16.

From equation (3.32) in section 3.3.2, the trans-conductance of OTA2\_OCL,  $g_{m,OTA2_OCL}$  is set equal to  $1/10^{\text{th}}$  of the trans-conductance of main OTA,  $g_m$ . Therefore, its topology can be

identical to that of OTA1, i.e., a power-efficient single-ended implementation, as shown in Fig. 4.17.

## 4.5.2 Implementation of Switches and Sizing of Capacitors

All the switches in OCL can be designed with minimum sized NMOS transistors. The capacitors in the 2-phase sample and hold block ( $C_{sh_ocl}$ ) can be sized considering the time constant formed with the on-resistance of the switches.

If  $R_{on,sh_ocl}$  is the on-resistance of a sampling/read-out switch in 2-phase sample and hold block, when implemented with a minimum-sized NMOS transistor,

$$R_{on,sh\_ocl} = R_{on\_\min\_siz\_sw\_1ch} = 6.5 \ k\Omega \tag{4.18}$$

The time constant formed with  $C_{sh_ocl}$  is given by

$$\tau_{sh\_ocl} = R_{on,sh\_ocl} \times C_{sh\_ocl} \tag{4.19}$$

In order to achieve 0.1% settling accuracy during sampling/read-out

$$7 \times \tau_{sh\_ocl} = \frac{1}{2f_s} \tag{4.20}$$

where  $f_s$  is the sampling frequency which has a value of 25MHz.

Therefore, above equation becomes

$$7 \times R_{on,sh\_ocl} \times C_{sh\_ocl} = 20 \ ns \tag{4.21}$$

Using equation (4.18) in equation (4.21), we get

$$C_{sh\_ocl} \approx 440 \, fF \tag{4.22}$$



Fig. 4.18. Front-end receive signal processing chain including the OCL and AZ loop

#### 4.5.3 Addition of Buffer in Each Branch of Delay Line in OCL

A PMOS transistor source follower is used as buffer in each branch of the delay line in OCL. It is placed between the memory capacitor,  $C_m$  and the read-out NMOS switch  $M_2$  (or  $M_5$ ). During calibration phase, the offset voltage values due to mismatch in input delay line are stored in  $C_m$ . In normal operation phase, when the read-out clock  $clk1_out_ocl$  (or  $clk2_out_ocl$ ) goes from high to low, the voltage across  $C_m$  gets altered because of charge injection due to the turning OFF of  $M_2$  (or  $M_5$ ). Therefore, a buffer is used in order to prevent any change in voltage across  $C_m$  during normal operation phase. In order to achieve maximum trans-conductance efficiency, the PMOS buffer transistors ( $M_3$  and  $M_6$ ) are biased in their weak inversion region. In our design, the bias current of each PMOS buffer transistor is 1  $\mu A$ .



Fig. 4.19. Addition of PMOS buffer in each branch of delay line in OCL

## 4.6 Overall Clocking Scheme

The clock signals are shown in Fig. 4.20. The delay in the falling edges of the clock signals  $clk\_rst\_adv$  and  $clk\_rst$ , as explained in section 4.4.2 is denoted by  $t_{rst}$  in the figure. The explanation of the overall operation of the system is identical to that given in section 3.3 for the OCL, with two additional points in order to include the AZ loop. These points are:

- 1. In each reset sub-phase, the auto-zeroing action takes place in which the auxiliary OTA drives a current in the AZ loop to charge capacitor  $C_{AZ}$  towards voltage  $V_{AZ}$ .
- 2. In each read-out sub-phase, the AZ loop is disconnected from the input and output of main OTA.





# 4.7 Conclusions

The main feature of this chapter is the power-efficient and area-efficient implementation of the single-ended topology for the main OTA (discussed in section 4.4). A slow settling AZ loop is required to provide level shift between the input and output DC levels of the OTA. The values of the design parameters of the AZ loop were determined, and the auxiliary OTA and the switches were implemented on a transistor-level. Besides, the stray-insensitive delay line is realized using MOS transistors, and a power-efficient implementation of the LNA using DTMOS device is achieved. In the next chapter, simulation results obtained from different analysis will be presented.

# References

[4.1] H. F. Achigui, C. J. B. Fayomi, and M. Sawan, "1-V DTMOS-based Class-AB Operational Amplifier: Implementation and Experimental Results," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 11, pp. 2440–2448, Nov. 2006.

[4.2] H. F. Achigui, C. J. B. Fayomi, and M. Sawan, "A DTMOS-based 1 V opamp," *Proc. Int. IEEE ICECS Conf.*, Dec. 2003, vol. 1, pp. 252–255.

[4.3] H. F. Achigui, C. J. B. Fayomi, and M. Sawan, "A 1 V low power, low noise DTMOS based class AB opamp," *Proc. Int. IEEE NEWCAS Conf.*, Jun. 2005, pp. 84–87.

[4.4] Zili Yu, Low-Power Receive-Electronics for a Miniature 3D Ultrasound Probe, PhD thesis, Delft University of Technology, 2012.

[4.5] P. R. Kinget, "Device mismatch and tradeoffs in the design of analog circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 6, pp. 1212–1224, Jun. 2005.

# **Chapter 5**

# **Simulation Results**

This chapter presents the simulation results obtained from various analyses performed in Cadence Spectre. In the first section, the results of transient analysis are discussed. The problem of the ripple pattern appearing at the output of main OTA, how it limits the dynamic range, and finally how it gets reduced when an OCL is used, is discussed in this section. Subsequently, results from noise analysis are discussed, both with and without the OCL. In the next section, the power consumption breakdown of the individual blocks is presented and finally, conclusions are drawn.

### **5.1 Transient Analysis**

In this section, the results obtained from transient analysis of the overall circuit design are presented. Firstly, the output waveforms are explained during the calibration phase and the normal operation phase. Subsequently, the waveforms at the output of main OTA are analyzed with and without OCL, when no AC signal is applied to the input of delay line. A ripple pattern appears at the output of main OTA when there is mismatch in the input delay line. This ripple pattern limits the dynamic range. Finally, it is shown that this problem is solved by employing an OCL, resulting in a high dynamic range of 80 dB.

## **5.1.1 Calibration Phase and Normal Operation**

The two phases of operation of the OCL are depicted in Fig. 5.1(a). Initially, there is a calibration period of 10 $\mu$ s, as indicated by the pulse width of the clock signal *clk\_cal* in Fig. 5.1(a). During this period, the input (*vin*) to the delay line is DC. After the calibration period,

the input signal is applied. In this figure, the input signal is a 6MHz sinusoidal signal with a peak value of 50mV. The simulation time is 15  $\mu$ s. The output of the main OTA (*vout\_main\_ota*) is a sampled-and-held return-to-zero waveform, as shown in Fig. 5.1(a). A 2-phase sample-and-hold block is placed at the output of main OTA. *vout\_ideal\_sh* is the signal at the output of this sample-and-hold stage. This signal is sampled-and-held non-return-to-zero type. A 20% mismatch (in size of switches in the out-of-phase parallel branches) was considered both in the input delay line and the delay line in OCL for this simulation.



(a)



Fig. 5.1. Calibration phase and Normal phase operation. (a) Clock signal and input and output waveforms (b) Zoomed-in version showing the type of signals at the output of main OTA and at the output of 2-phase sample-and-hold block, (c) Zoomed-in version showing one cycle of the input signal. For this simulation, there is 20% mismatch both in the input delay line and the delay line in OCL.

The signals *vout\_main\_ota* and *vout\_ideal\_sh* can be seen clearly in Fig. 5.1(b). This figure is obtained by zooming into a window of 500 ns from 12  $\mu$ s – 12.5  $\mu$ s in the normal operation phase.

In Fig. 5.1(c), it can be seen that the peak value of the input signal which is sampled at around 12.04  $\mu$ s onto the sampling capacitor of the input delay line, appears at the output of main OTA, being amplified by the gain of the charge amplifier (~10) at around 12.08  $\mu$ s. After a further delay of 20 ns (which corresponds to the delay between the sampling and read-out clocks of each branch of the 2-phase sample-and-hold block), this signal shows up at the output of the 2-phase sample-and-hold block. The value of the signal *vout\_ideal\_sh* remains constant at this peak value of the input signal for the next 40 ns till a new value of the signal is read-out from the output of main OTA.

### 5.1.2 Offset at Output With and Without OCL

The signal at the output of main OTA is shown for three different cases in Fig. 5.2. In all the three cases, the input to the delay line is DC for the entire length of the simulation time window. The three scenarios are:

- Case I: No mismatch in input delay line and No OCL
- Case II: 20% mismatch in input delay line, but No OCL
- Case III: 20% mismatch in input delay line, and with an OCL (No mismatch in the delay line in OCL).

An important point to note is that in all the scenarios, 'mismatch' refers to mismatch in size of switches in the out-of-phase parallel branches. The waveforms for the three scenarios are shown as *vout*1, *vout*2 and *vout*3 in Fig. 5.2(a). A zoomed-in version of Fig. 5.2(a) is shown in Fig. 5.2(b). In this figure, the initial settling behavior for the three different cases can be observed. The zoom-in window is chosen between 0 to 1  $\mu$ s.



(b)

Fig. 5.2. Output of Main OTA as function of time for 3 different cases. (a) The time window is 11  $\mu$ s, which includes the calibration period of 10  $\mu$ s, and (b) The time window is 1  $\mu$ s, in which the initial settling behavior is depicted.


In Fig. 5.3, a zoom-in window of 140ns (from 9.86 $\mu$ s to 10 $\mu$ s) is selected at the end of the calibration period. From this figure, it can be observed that for the second scenario in which there is mismatch in the input delay line, a ripple pattern occurs at the output of main OTA. The peak-to-peak value of this ripple is 3.3mV. This ripple pattern which is caused due to mismatch in switches of the input delay line (as explained in section 3.2.4), limits the dynamic range at the input of the delay line. When an OCL is used (with an ideal delay line), this value gets reduced to 50 $\mu$ V, thereby, enhancing the dynamic range. If there is mismatch in the delay line in OCL, the peak-to-peak value of the ripple reduces by a factor of 5 (approximately), as compared to case I (depicted in appendix C).

#### 5.1.3 Illustration of High Dynamic Range of Micro-Beamformer

The dynamic range plot of the micro-beamformer is shown in Fig. 5.4. The input voltage (on X-axis) is the peak-to-peak value of the input signal of the delay line and the output voltage (on Y-axis) is the peak-to-peak value at the output of the 2-phase sample-and-hold block. The input and output voltages are shown on a log scale. The peak values of the output voltages were obtained by taking a 16-point FFT of the waveforms at the output of the 2-phase sample-and-hold block. The FFT time window was carefully selected in the normal operation phase.

Without an OCL, the thermal noise floor is at 370  $\mu V_{rms}$  level. The peak-to-peak value of output ripple is located at 3.3 mV level. Therefore, when there is no OCL, the dynamic range is limited by the ripple. The input-referred ripple (at the input of delay line) is  $(3.3 mV/10 =) 330 \mu V$ . The minimum signal level at the output of LNA is 100  $\mu$ V. Therefore, the smallest signal level is drowned in the ripple.

When an OCL (with an ideal delay line) is employed, the peak-to-peak value of the ripple gets reduced by almost 2 orders of magnitude (factor of 66). The thermal noise floor level increases marginally (compared to that without OCL) to 390  $\mu V_{rms}$ . Therefore, thermal noise becomes the fundamental factor limiting the dynamic range when an OCL is employed. The input-referred thermal noise is (390  $\mu V/10 =$ ) 39  $\mu V$ . Since the smallest signal level (100  $\mu V$ ) is above the noise level (at the output of LNA), it can be detected.

Each transducer element produces signal levels from 10  $\mu$ V to 100 mV. If the signals are referred to the input of LNA, it is possible to detect the minimum signal level of (100  $\mu$ V/10 =) 10  $\mu$ V.

In addition, it is possible to detect the maximum signal level by bypassing the LNA (for higher signal levels) as explained in [5.2]. Therefore, all transducer output signal levels from 10  $\mu$ V to 100 mV can be detected when an OCL (with an ideal delay line) is employed. In conclusion, the dynamic range of the entire signal processing chain is extended to 80 dB, when an OCL with an ideal delay line is used.



Fig. 5.4 Dynamic Range plot of the Micro-beamformer. The ripple levels and the noise floor are indicated in figure.

## 5.2 PSS and PNoise Analysis

Since the front-end signal processing chain consists of blocks with switched-capacitor circuits, the noise levels were determined using PSS and PNoise analysis [5.1]. The 3dB-bandwidth of the output signal of a transducer element (input signal to the delay line) is chosen as the noise bandwidth.

An ideal 2-phase sample-and-hold block was used at the output of main OTA, so that its own noise contribution is negligible compared to the overall circuit. The input-referred noise at the input of the delay line was found to be 40.2  $\mu$ V, as shown in Fig. 5.5. The main noise contributor is the PMOS load transistor in the main OTA (shown as transistor  $M_4$  in Fig. 4.17), as was

discussed in section 4.4.3. The name of its instance in design schematic in Cadence Spectre is /I30/M3 as indicated in Fig. 5.5. The next major contributor of noise is the input NMOS transistor of the main OTA (shown as transistor  $M_1$  in Fig. 4.17), which has an instance name of /I30/M0 in design schematic.

When an OCL is employed, the input-referred noise level increases to 42.8  $\mu$ V, as shown in Fig. 5.6. The top three contributors of noise remain unaltered from the previous case. There is a change in the next two contributors of noise. With an OCL, the read-out NMOS transistor switches in the delay line in OCL (which are driven by the PMOS buffer transistors) round up the last two places in the top 5 contributors of noise.

| Window Exp                               | p <mark>ressions l</mark> i             | nfo                                                             | Help 2             |
|------------------------------------------|-----------------------------------------|-----------------------------------------------------------------|--------------------|
| Device                                   | Param                                   | Noise Contribution                                              | % Of Total         |
| /I30/M3                                  | id                                      | 0.000126831                                                     | 11.81              |
| /I30/M0                                  | id                                      | 0.0001243                                                       | 11.34              |
| /I30/M0                                  | fn                                      | 0.000120336                                                     | 10.63              |
| /I45/I0/M2                               | id                                      | 6.71397e-05                                                     | 3.31               |
| /M20                                     | id                                      | 6.60967e-05                                                     | 3.21               |
| /I45/I0/M4                               | id                                      | 6.59752e-05                                                     | 3.20               |
| /124/M66                                 | id                                      | 6.41063e-05                                                     | <mark>3. 02</mark> |
| /I45/I0/M6                               | id                                      | 6.39294e-05                                                     | 3.00               |
| /124/M64                                 | id                                      | 6.14018e-05                                                     | 2.77               |
| /124/M65                                 | id                                      | 6.1109e-05                                                      | 2.74               |
| /I45/I0/M7                               | id                                      | 5.67802e-05                                                     | 2.37               |
| /I45/I0/M6                               | fn                                      | 3.83481e-05                                                     | 1.08               |
| /I16<2>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<3>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<4>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<6>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<5>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<8>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<7>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| /I16<0>/M6                               | id                                      | 3.31966e-05                                                     | 0.81               |
| Integrated                               | Noise Summ                              | ary (in ♥) Sorted By N                                          | oise Contributors  |
| Total Summa                              | arized Nois                             | e = 0.00036909                                                  |                    |
| Integrated<br>Total Summa<br>Total Input | Noise Summ<br>arized Nois<br>C Referred | ary (in V) Sorted By N<br>e = 0.00036909<br>Noise = 4.01661e-05 | oise Contributors  |

Fig. 5.5. Results from PSS and PNoise analysis showing the total input-referred noise and the top contributors of noise. There is no OCL in this case.

The increase in the input-referred noise due to the OTAs and PMOS buffer transistors in the OCL is minimized by setting the unity-gain bandwidth (UGB) of the OCL much lower than the lower bound of the frequency range of the ultrasound signal, i.e., 4.5 MHz (as discussed in section 3.3.2).

| Window Expi  | ressions Ir | nfo                    | He                    | ip 25 |
|--------------|-------------|------------------------|-----------------------|-------|
| )evice       | Param       | Noise Contribution     | <pre>% Of Total</pre> |       |
| /I30/M3      | id          | 0.000124282            | 9.97                  |       |
| /I30/M0      | id          | 0.000124022            | 9.93                  |       |
| /I30/M0      | fn          | 0.000119668            | 9.25                  |       |
| /I96/M3      | id          | 7.70784e-05            | 3.84                  |       |
| /I96/M2      | id          | 7.47963e-05            | 3.61                  |       |
| /I45/I0/M2   | id          | 6.70043e-05            | 2.90                  |       |
| /M20         | id          | 6.63335e-05            | 2.84                  |       |
| /I45/I0/M4   | id          | 6.63076e-05            | 2.84                  |       |
| /124/M66     | id          | 6.47592e-05            | 2.71                  |       |
| /I45/I0/M6   | id          | 6.3999e-05             | 2.64                  |       |
| /I24/M64     | id          | 6.20027e-05            | 2.48                  |       |
| /124/M65     | id          | 6.17604e-05            | 2.46                  |       |
| /I45/I0/M7   | id          | 5.67983e-05            | 2.08                  |       |
| /M2          | id          | 5.11891e-05            | 1.69                  |       |
| /196/M5      | id          | 4.94271e-05            | 1.58                  |       |
| /I96/M4      | id          | 4.81684e-05            | 1.50                  |       |
| /I45/I0/M6   | fn          | 3.83491e-05            | 0.95                  |       |
| /I16<8>/M6   | id          | 3.32228e-05            | 0.71                  |       |
| /I16<7>/M6   | id          | 3.32228e-05            | 0.71                  |       |
| /I16<5>/M6   | id          | 3.32228e-05            | 0.71                  |       |
| Integrated M | Joise Summ  | arv (in V) Sorted Bv N | Noise Contributors    |       |
| Cotal Summar | ized Nois   | e = 0.000393518        |                       |       |
| Cotal Input  | Referred    | Noise = 4.27937e-05    |                       |       |
| The above no | ise summa   | ry info is for pnoised | )ut1 data             |       |
|              |             |                        |                       |       |
|              |             |                        |                       |       |

Fig. 5.6. Results from PSS and PNoise analysis showing the total input-referred noise and the top contributors of noise. In this case, an OCL is employed which increases the noise level marginally.

## 5.3 Power Consumption

The break-down of the power consumption of individual blocks is shown in Table 5.1. The analog blocks of the charge amplifier are the main OTA, bias block, and the OTAs in the AZ loop and OCL. The digital logic block generates all the clocks, which are required for driving MOS switches in the input delay line, sample-and-hold blocks in OCL and at the output of main OTA, and the delay line in OCL.

The total power consumption per transducer element is 91  $\mu$ W. In order to make a valid comparison with the total power consumption in previous work [5.2], we must take into account the fact that each delay line in our proposed design will need to be expanded from 2 parallel branches (1 delay setting) to 8 parallel branches (7 delay settings). The dynamic power consumption according to equation (2.6) is given by

$$P_{dyn} = \kappa \times f \times C_{Load} \times V_{DD}^{2}$$

where  $\kappa$  is the probability of a power consuming transition  $(0 \rightarrow 1)$  or switching activity factor, f is the clock frequency,  $C_{Load}$  is the load capacitance and  $V_{DD}$  is the supply voltage.

Both  $\kappa$  and  $C_{Load}$  will remain almost the same with the expansion of the delay line in our proposed design. Therefore, the dynamic power consumption and eventually, the overall power consumption will also be the same (approximately) as shown in Table 5.1. This dynamic power consumption value is quite close to that predicted in section 2.5.

| Block                            | Power consumption of the circuit for one |  |
|----------------------------------|------------------------------------------|--|
|                                  | transducer element (in $\mu$ W)          |  |
| LNA                              | 53.3                                     |  |
| Charge Amplifier – Analog Blocks | 202.3/9 = 22.5                           |  |
| Charge Amplifier – Digital Logic | 137.2/9 = 15.2                           |  |
| Total                            | 91.0                                     |  |

 

 Table 5.1. Power consumption of individual blocks in the proposed design of receive front-end signal processing chain

## **5.4 Conclusions**

The highlight of this chapter is the illustration of the high-dynamic-range of the microbeamformer using an OCL. This enhancement in dynamic range is obtained with almost a factor of 5 reduction in total power consumption compared to that in the previous work [5.2]. The increase in the input-referred noise at the input of the delay line due to the addition of the OCL is marginal (6.5%).

# References

[5.1] K. Kundert, *Simulating switched-capacitor filters with Spectre RF*, [Online] Available: http://www.designers-guide.com/Analysis/sc-filters.pdf

[5.2] Zili Yu, *Low-Power Receive-Electronics for a Miniature 3D Ultrasound Probe*, PhD thesis, Delft University of Technology, 2012.

# **Chapter 6**

# Conclusions

### 6.1 Summary

In this thesis, the theory and implementation of a novel high-dynamic-range micro-beamformer in the receive front-end signal processing chain, for 3D TEE imaging application has been presented. The micro-beamformer is based on active charge-mode summation (the summation of the output signals of the 9 delay lines occurs at the virtual ground node of the main OTA). The functionalities of the TGC and the delay line have been merged together into a switchedcapacitor-based charge amplifier, thereby, resulting in a compact and area-efficient design. The problem limiting the dynamic range in a previous micro-beamformer implementation [6.1] has been analyzed and a solution has been presented in the form of an offset-calibration loop (OCL). A single-ended topology is used to implement the main OTA, which makes it both power-and area-efficient. A slow-settling AZ loop is used to ensure maximum signal swing at the output of the main OTA.

The front-end receive signal processing chain was implemented in TSMC 0.18µm high-voltage CMOS process. Unfortunately, because of health related troubles and certain administrative issues in getting access to standard logic and I/O cell libraries, the design could not be laid out, and hence, the performance of the corresponding chip could not be measured. However, the functionality and merits of the proposed design have been demonstrated through extensive system and transistor-level analysis and simulations. These illustrate the lower power

consumption and higher dynamic range, as compared to the previous state-of-the-art design in [6.1].

The proposed design offers significant advantages over the existing design and provides a simpler and more efficient solution for 3D TEE imaging application.

# 6.2 Main Contributions

The main features of the proposed design of the high-dynamic-range micro-beamformer based front-end receive signal processing chain are the following:

- A power-efficient design has been implemented, in which the power consumption is a factor of 5 lower compared to the previous design [6.1].
- The dynamic range at the input of the micro-beamformer is enhanced substantially. This has been achieved by means of a simple and efficient solution of incorporating an OCL.
- All the individual blocks in the front-end signal processing chain (LNA, delay line and main OTA) have been implemented with a single-ended topology, which is in sync with the inherent single-ended nature of the output signal from a transducer element. Consequently, the need for making the signal-processing chain differential (or pseudo-differential) at any point is obviated.
- The delay line is implemented using a stray-insensitive topology.
- The LNA is implemented using DTMOS transistor, which makes it power-efficient (for the same gain and lower input-referred noise level compared to the LNA designed in [6.1]).

# 6.3 Scope for Future Work

In the proposed design, each delay line (comprising of 2 parallel branches) has 1 delay setting. In order to obtain 7 delay settings, as was the case in the previous design [6.1], each delay line needs to be expanded to 8 parallel branches, which would require generation of additional clocks in order to drive the MOS switches in each branch. Secondly, the OTAs in the OCL need to be implemented at transistor-level, based on the guidelines given in section 4.5.1. Finally, design techniques are required to improve the power supply rejection ratio of the single-ended main OTA.

Though a power-efficient and area-efficient design has been implemented and validated using simulation results, the design needs to be laid out, post-layout simulations should be carried out,

and finally, the fabricated chip must be tested after it is back from foundry. The guidelines necessary for layout of the proposed design are mentioned in section 6.3.1.

## 6.3.1 Guidelines for Layout

The rules which need to be followed for the layout of the proposed design are:

- 1. The analog blocks (biasing block, main OTA, OTAs in AZ loop and OCL) should be isolated from the digital logic by using guard rings, in order to prevent any coupling of signals between analog and digital blocks.
- 2. The power supply and ground lines should be separated for analog and digital signals.
- 3. Special attention must be given to ensure matching of sampling capacitors in out-ofphase branches in the delay line. Else, a phase dependent gain error will occur. These rules should be followed for capacitor matching [6.2]:
  - Identical geometries should be used.
  - Use of large unity capacitance in order to minimize fringe effect.
  - Large capacitors should be made multiples of unit capacitors.
  - Use of common-centroid arrangement.
- Since many clock signals are involved in the micro-beamformer, in order to minimize clock-to-clock and clock-to-signal coupling, on-chip shielding of clock signals is required [6.1].
- 5. The layout of the differential OTAs (indicated as OTA1\_AZL and OTA1\_OCL in Fig. 4.18) should be done in such a way that their input-referred offset values are minimum (as explained in section 4.4.2). Attention should also be given to the matching of the input pair transistors. A common-centroid technique (ABBA) can be used for the layout of these transistors.
- 6. In order to filter supply noise, decoupling capacitors can be distributed over the chip.

## References

[6.1] Zili Yu, *Low-Power Receive-Electronics for a Miniature 3D Ultrasound Probe*, PhD thesis, Delft University of Technology, 2012.

[6.2] F. Maloberti, Layout of Analog CMOS Integrated Circuit – Passive components: Resistors, Capacitors, [Online] Available: http://ims.unipv.it/Microelettronica/Layout03.pdf (December 2012)

# Appendix A: Derivation of Unity-Gain Bandwidth of AZ Loop

In order to derive the unity-gain bandwidth (UGB) of AZ loop, which is shown in Fig. A1, the loop is broken at the input of OTA1\_AZL (node A) which is a high-impedance node. After replacing the OTAs with their equivalent small-signal models, the resulting schematic is shown in Fig. A2.



Fig. A1. Operation of AZ loop, Phase I: Auto-zeroing. The loop is broken at node A which is the input of OTA1\_AZL.

After the loop is broken,  $v_A$  and  $v_B$  become the input and output voltages of the loop, respectively.  $g_m$  and  $g_{m,OTA1\_AZL}$  are the trans-conductances of the main and auxiliary OTAs, respectively, and  $C_{AZ}$  is the auto-zeroing capacitor.

Applying Kirchhoff's current law (KCL) at the nodes to the left and right of  $C_{AZ}$ , we get two equations:

$$g_{m,oTA1\_AZL} \times v_A - \frac{v_1}{r_{o,oTA1\_AZL}} - (v_1 - v_B) \times sC_{AZ} = 0$$
  
$$\implies v_1 = \frac{g_{m,oTA1\_AZL}}{1/r_{o,oTA1\_AZL} + sC_{AZ}} \times v_A + \frac{sC_{AZ}}{1/r_{o,oTA1\_AZL} + sC_{AZ}} \times v_B$$
(A1)

$$g_m \times v_1 + \frac{v_B}{r_o} + (v_B - v_1) \times sC_{AZ} = 0$$
  
$$\Rightarrow v_1 = -v_B \times \frac{1/r_o + sC_{AZ}}{g_m - sC_{AZ}}$$
(A2)



Fig. A2. The schematic of AZ loop after the loop is broken at node A and the OTAs are replaced with their small-signal equivalent models.

From equations (A1) and (A2),

$$-v_B \times \left[\frac{1/r_o + sC_{AZ}}{g_m - sC_{AZ}} + \frac{sC_{AZ}}{1/r_{o,OTA1\_AZL} + sC_{AZ}}\right] = v_A \times \frac{g_{m,OTA1\_AZL}}{1/r_{o,OTA1\_AZL} + sC_{AZ}}$$

$$\Rightarrow \frac{v_B}{v_A} = -\frac{g_{m,OTA1\_AZL} \times (g_m - sC_{AZ})}{\frac{1}{r_o \times r_{o,OTA1\_AZL}} + sC_{AZ} \times \left(\frac{1}{r_o} + \frac{1}{r_{o,OTA1\_AZL}} + g_m\right)}$$

$$\Rightarrow \frac{v_B}{v_A} = -(g_m r_o).(g_{m,OTA1\_AZL}.r_{o,OTA1\_AZL}) \times \frac{1 - sC_{AZ}/g_m}{1 + sC_{AZ}.(r_o + r_{o,OTA1\_AZL} + g_m r_o r_{o,OTA1\_AZL})}$$

$$\Rightarrow \frac{v_B}{v_A} \approx -(g_m r_o) \cdot (g_{m,oTA1\_AZL} \cdot r_{o,oTA1\_AZL}) \times \frac{1 - sC_{AZ}/g_m}{1 + sC_{AZ} \times g_m r_o r_{o,oTA1\_AZL}}$$
  
(Since  $r_o, r_{o,oTA1\_AZL} << g_m r_o r_{o,oTA1\_AZL}$ )

The DC value of the loop gain is  $(g_m r_o)$ .  $(g_{m,OTA1\_AZL}, r_{o,OTA1\_AZL})$ .

At unity gain frequency  $f_{UGB,AZ\_loop}$ ,

$$\left|\frac{v_B}{v_A}\right| = 1$$

$$\Rightarrow \left| -g_m r_o \cdot g_{m,OTA1\_AZL} \cdot r_{o,OTA1\_AZL} \times \frac{1 - sC_{AZ}/g_m}{1 + sC_{AZ} \times g_m r_o \cdot r_{o,OTA1\_AZL}} \right| = 1$$

$$\Rightarrow \left(g_m r_o. g_{m, OTA1\_AZL}. r_{o, OTA1\_AZL}\right)^2 \times \frac{\left[1 + \left(\omega_{UGB, AZ\_loop} C_{AZ}/g_m\right)^2\right]}{\left[1 + \left(\omega_{UGB, AZ\_loop} C_{AZ} \times g_m r_o. r_{o, OTA1\_AZL}\right)^2\right]} = 1$$

Now,  $\omega_{UGB,AZ\_loop} \gg 1/(g_m r_o. r_{o,OTA1\_AZL} \times C_{AZ})$ . Therefore, the above equation reduces to

$$1 + \left(\omega_{UGB,AZ\_loop}, C_{AZ}/g_m\right)^2 = \left(\omega_{UGB,AZ\_loop}C_{AZ}/g_{m,OTA1\_AZL}\right)^2$$

$$\Rightarrow \left(\omega_{UGB,AZ\_loop}C_{AZ}\right)^2 \times \left(\frac{1}{g_{m,OTA1\_AZL}^2} - \frac{1}{g_m^2}\right) = 1$$

Since  $g_m > g_{m,OTA1\_AZL}$ ,  $1/g_m^2 \ll 1/g_{m,OTA1\_AZL}^2$ , the above equation becomes

$$\left(\omega_{UGB,AZ\_loop}C_{AZ}\right)^{2} \times \left(\frac{1}{g_{m,OTA1\_AZL}^{2}}\right) = 1$$
$$\Rightarrow \omega_{UGB,AZ\_loop} = \frac{g_{m,OTA1\_AZL}}{C_{AZ}}$$
$$\Rightarrow f_{UGB,AZ\_loop} = \frac{g_{m,OTA1\_AZL}}{2\pi C_{AZ}}$$

This result is given in equation (4.12).

# **Appendix B: Simulation Results – AC and Stability Analysis**

## **B1. AC Analysis**

### **Open-Loop Gain of Main OTA**

The magnitude and phase plots of the open-loop gain of the main OTA are shown in Fig. B1. The OTA is loaded by its own parasitic capacitances and the series combination of the feedback capacitor and the total sampling capacitance of the input delay line.

If  $g_m$  is the trans-conductance of the main OTA,  $C_{par}$  is the total parasitic capacitance at its output,  $C_{fb}$  is the feedback capacitance, and  $C_s$  is the total sampling capacitance of the input delay line, then its UGB is given by

$$f_{UGB} = \frac{g_m}{2\pi \left( C_{par} + \frac{C_{fb}C_s}{C_{fb} + C_s} \right)} \tag{B1}$$

Since,  $C_{fb} = C_s/10$  (corresponding to the fixed gain of the charge amplifier) and  $C_{fb} \gg C_{par}$ , the above equation reduces to

$$f_{UGB} \approx \frac{g_m}{2\pi C_{fb}} \tag{B2}$$



Fig. B1. Open-loop gain of main OTA - Magnitude and Phase plots

In the proposed design,  $g_m = 1.6 mS$  and  $C_{fb} = 162 fF$ . Therefore, UGB is given by

$$f_{UGB} = \frac{1.6 \text{ mS}}{2\pi \times 162 \text{ fF}} \approx 1.6 \text{ GHz}$$

From Fig. B1, it can be seen that the DC value of the open-loop gain is almost 65 dB and the UGB is 1.5 GHz. This value of UGB is close to that calculated using equation (*B*2).

### **Closed-Loop Gain of Charge Amplifier**

The magnitude and phase plots of closed-loop gain of the switched-capacitor charge amplifier is depicted in Fig. B2. The DC value of the gain is 20 dB corresponding to the ratio of  $C_s$  and  $C_{fb}$ . The -3 dB bandwidth is 49 MHz. This corresponds to the time constant of the circuit ( $\tau$ ) during the charge transfer phase, as obtained in equation (3.24).

From that equation,

$$\tau = 2.86 \, ns \, (\text{for } 7\tau \, \text{settling})$$

The -3dB bandwidth is given by

$$f_{-3dB} = \frac{1}{2\pi \times \tau} = \frac{1}{2\pi \times 2.86 \, ns} = 55.6 \, MHz \tag{B3}$$

Therefore, value of  $f_{-3dB}$  obtained from simulation result and calculation is close.



Fig. B2. Closed-loop gain of switched-capacitor charge amplifier - Magnitude and Phase plots

## **B2.** Stability Analysis

## **Charge-Transfer Phase Loop**

The bode plots for the loop formed during the charge-transfer phase are shown in Fig. B3. The DC value of the loop gain is 40dB and the phase margin (PM) is given by

$$PM_{Read out \ loop} = 180^{\circ} - 109.4^{\circ} = 70.6^{\circ}$$

which indicates that the loop is stable.



Fig. B3. Loop gain during charge transfer phase – Magnitude and Phase plots

### **Auto-Zeroing Loop**

The bode plots for the AZ loop are shown in Fig. B4. The DC value of the loop gain is 103 dB. The equation for the UGB of AZ loop is derived in Appendix A and given in equation (4.12) as:

$$f_{UGB,AZ\_loop} = \frac{g_{m,OTA1\_AZL}}{2\pi C_{AZ}}$$

In our design,  $g_{m,OTA1\_AZL} = 10 \ \mu S$  and  $C_{AZ} = 1.4 \ pF$ . Therefore, the above equation becomes

$$f_{UGB,AZ\_loop} = \frac{10 \ \mu S}{2\pi \times 1.4 \ pF} \approx 1.1 \ MHz$$

Therefore, the calculated value is exactly matched with that obtained in simulation, as indicated in the magnitude plot in Fig. B4.

The phase margin (from Fig. B4) is given by

$$PM_{AZ\_loop} = 180^{\circ} - 94.1^{\circ} = 85.9^{\circ}$$

which implies that AZ loop is stable.



Fig. B4. Loop gain of AZ loop - Magnitude and Phase plots

#### **Offset Calibration Loop**

The magnitude and phase plots of the loop gain of OCL are given in Fig. B5. From the magnitude plot, it can be seen that there exists a pole at DC, which results in a -20 dB per decade slope. The UGB is given by

$$f_{UGB\_OCL} = \frac{g_{m\_OTA1\_OCL}}{2\pi C_m} \tag{B4}$$

where  $g_{m_OTA1_OCL}$  is the trans-conductance of OTA1\_OCL and  $C_m$  is the value of memory capacitance in the delay line in OCL.

In our design,  $g_{m_OTA1_OCL} = 20 \,\mu S$  and  $C_m = 1.6 \,pF$ . Putting these values in the above equation, we get

$$f_{UGB\_OCL} = \frac{20 \ \mu S}{2\pi \times 1.6 pF} \approx 2 \ MHz$$

In the magnitude plot shown in Fig. B5, the value of  $f_{UGB \ OCL}$  is 2.5 MHz.

The phase margin of OCL can be obtained from the phase plot in Fig. B5 as

$$PM_{OCL} = 180^{\circ} - 87^{\circ} = 93^{\circ}$$

which means OCL is stable.



Fig. B5. Loop gain of OCL - Magnitude and Phase plots

In Fig. B6, bode plots for OCL are obtained by varying  $g_{m1}(=g_{m_OTA1_OCL})$  from 10 µS to 1 mS. As  $g_{m1}$  varies,  $f_{UGB_OCL}$  also changes proportionately. In order to ensure that noise generated in the OCL doesn't increase the input-referred noise at the input of delay line, the value of  $g_{m1}$  was chosen such that  $f_{UGB_OCL} \ll 4.5 MHz$ .



Fig. B6. Loop gain of OCL – Magnitude and Phase plots for different values of  $g_{m1}$ 

# Appendix C: Effect of OCL on Dynamic Range – A Case Study

In this appendix, the role of OCL in enhancing dynamic range, by reducing the ripple at output of main OTA, is illustrated by presenting various scenarios. For all the scenarios, the input to the delay line is DC for the entire length of the simulation time window. The results are presented in Table C1.  $\Delta V_{phase}$  is the peak-to-peak ripple at output of main OTA (at the end of calibration phase). In the fourth column of the table, the factor of reduction for all the cases is mentioned relative to first case (with no OCL). When no OCL is employed in case 1,  $\Delta V_{phase}$  is maximum (3.3 mV). In case 2, when an OCL is used (without the 2-phase sample-and-hold block) with an ideal delay line (no mismatch),  $\Delta V_{phase}$  reduces by a factor of 2. When there is mismatch in delay line in OCL,  $\Delta V_{phase}$  is slightly higher (case 3) compared to case 2. With an ideal delay line in OCL and a 2-phase sample-and-hold (S&H) block placed between the output of main OTA and OTA1\_OCL (as shown in Fig. 3.10),  $\Delta V_{phase}$  is minimum (50 µV). When there is mismatch in delay line in OCL,  $\Delta V_{phase}$  increases by an order of magnitude (relative to case 4), as shown for cases 5 and 6.

Table C1. Peak-to-peak ripple values for different scenarios

| Case   | Description | $\Delta V_{phase}$ | Factor of |
|--------|-------------|--------------------|-----------|
| number |             | (in mV)            | reduction |
| 1      | Without OCL | 3.3                | -         |

| 2 | With OCL                                   | 1.8   | 1.8 |
|---|--------------------------------------------|-------|-----|
|   | • No S&H Block in OCL                      |       |     |
|   | • No mismatch (in switch sizes) between    |       |     |
|   | out-of-phase branches of delay line in     |       |     |
|   | OCL                                        |       |     |
| 3 | With OCL                                   | 2     | 1.7 |
|   | No S&H Block in OCL                        |       |     |
|   | • 20% mismatch (in switch sizes) between   |       |     |
|   | out-of-phase branches of delay line in     |       |     |
|   | OCL                                        |       |     |
| 4 | With OCL                                   | 0.050 | 66  |
|   | • S&H Block in OCL                         |       |     |
|   | • PMOS Buffer in each branch of delay line |       |     |
|   | in OCL                                     |       |     |
|   | • No mismatch in PMOS Buffers              |       |     |
|   | • No mismatch (in switch sizes) between    |       |     |
|   | out-of-phase branches of delay line in     |       |     |
|   | OCL                                        |       |     |
| 5 | With OCL                                   | 0.5   | 6.6 |
|   | • S&H Block in OCL                         |       |     |
|   | • PMOS Buffer in each branch of delay line |       |     |
|   | in OCL                                     |       |     |
|   | • 20% mismatch (in switch sizes & buffer   |       |     |
|   | transistor) between out-of-phase branches  |       |     |
|   | of delay line in OCL                       |       |     |

| 6 | With OCL                                   | 0.7 | 4.7 |
|---|--------------------------------------------|-----|-----|
|   | • S&H Block in OCL                         |     |     |
|   | • PMOS Buffer in each branch of delay line |     |     |
|   | in OCL                                     |     |     |
|   | • No mismatch in PMOS Buffers              |     |     |
|   | • 20% mismatch (in switch sizes) between   |     |     |
|   | out-of-phase branches of delay line in     |     |     |
|   | OCL                                        |     |     |
|   |                                            |     |     |